Skip to left side bar
>
  • File
  • Edit
  • View
  • Run
  • Kernel
  • Tabs
  • Settings
  • Help
Mem:
3.76 GB

Open Tabs

  • Launcher
  • Models2.ipynb
  • Models.ipynb
  • FinalProject.ipynb
  • Output View

Kernels

  • Models2.ipynb
  • FinalProject.ipynb
  • Models.ipynb

Terminals

    GPU Dashboards
    Warning

    The JupyterLab development team is excited to have a robust third-party extension community. However, we do not review third-party extensions, and some extensions may introduce security risks or contain malicious code that runs on your machine. Moreover in order to work, this panel needs to fetch data from web services.
    Please read the privacy policy.

    Installed
    Discover
    /
    Name
    ...
    Last Modified
    • ActivityRecognitionModelBuilding.ipynb17 days ago
    • M01_Homework_Cordelli.ipynb3 months ago
    • Models.ipynbseconds ago
    • Models2.ipynb41 minutes ago
    • Models3~Sentiment.ipynb5 hours ago
    • Launcher
    • Models2.ipynb
    • Models.ipynb
    • FinalProject.ipynb

    Notebook

    Python 3 (ipykernel)

    AMPTorch (20201028) Active Learning

    AMPTorch 0.1

    ASE 3.20.1

    O

    OCP Models 0.1.0

    PyTorch 1.12.0

    PyTorch 2.0.1

    PyTorch 2.4.0

    R 4.4.1

    RAPIDS 24.06

    R

    RHESSys Biome-BGC

    R

    RHESsys_v3

    R

    RHESsys_v3_3

    Tensorflow 2.13.0

    Tensorflow 2.17.0

    Console

    Python 3 (ipykernel)

    AMPTorch (20201028) Active Learning

    AMPTorch 0.1

    ASE 3.20.1

    O

    OCP Models 0.1.0

    PyTorch 1.12.0

    PyTorch 2.0.1

    PyTorch 2.4.0

    R 4.4.1

    RAPIDS 24.06

    R

    RHESSys Biome-BGC

    R

    RHESsys_v3

    R

    RHESsys_v3_3

    Tensorflow 2.13.0

    Tensorflow 2.17.0

    Other

    Terminal

    Text File

    Markdown File

    Python File

    R File

    Show Contextual Help

    Kernel status: Idle Executed 1 cellElapsed time: 3 seconds
    [1]:
     
    import pandas as pd
    import numpy as np
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.decomposition import LatentDirichletAllocation as LDA
    import plotly_express as px
    ​
    from sklearn.decomposition import PCA
    from sklearn.preprocessing import normalize
    [2]:
     
    pd.set_option('display.max_colwidth', None)
    import configparser
    config = configparser.ConfigParser()
    config.read('env.ini')
    data_home = config['DEFAULT']['data_home']
    output_dir = config['DEFAULT']['output_dir']
    [3]:
     
    data_prefix = 'entrepreneur'
    colors = "YlGnBu"
    [4]:
     
    ngram_range = (1, 2)
    n_terms = 4000
    n_topics = 40
    max_iter = 20
    n_top_terms = 9
    ​
    OHCO = ['screenplay_id', 'scene_id', 'para_num', 'sent_num', 'token_num']
    PARA = OHCO[:3]
    SCENE = OHCO[:2]
    SCREENPLAY = OHCO[:1]
    [5]:
     
    BAG = SCENE
    import warnings
    warnings.filterwarnings('ignore')
    [6]:
     
    TOKENS = pd.read_csv(f'{output_dir}/{data_prefix}-TOKEN.csv').set_index(OHCO)
    TOKENS.head()
    [6]:
    pos_tuple pos token_str term_str pos_group
    screenplay_id scene_id para_num sent_num token_num
    joy 1 0 0 0 ('The', 'DT') DT The the DT
    1 ('kitchen', 'NN') NN kitchen kitchen NN
    2 ('of', 'IN') IN of of IN
    3 ('a', 'DT') DT a a DT
    4 ('drive', 'NN') NN drive drive NN
    # Prep for LDA

    Prep for LDA¶

    [7]:
     
    DOCS = TOKENS[TOKENS.pos.str.match(r'^NNS?$')]\
        .groupby(BAG).term_str\
        .apply(lambda x: ' '.join(map(str,x)))\
        .to_frame()\
        .rename(columns={'term_str':'doc_str'})
    [8]:
     
    count_engine = CountVectorizer(max_features=n_terms, ngram_range=ngram_range, stop_words='english')
    count_model = count_engine.fit_transform(DOCS.doc_str)
    TERMS = count_engine.get_feature_names_out()
    [9]:
     
    VOCAB = pd.DataFrame(index=TERMS)
    VOCAB.index.name = 'term_str'
    [10]:
     
    DTM = pd.DataFrame(count_model.toarray(), index=DOCS.index, columns=TERMS)
    DTM
    [10]:
    05 1350000 aback ability absorbing access account accounts acre act ... youll youre youre beat youre gonna youre right youve yule zero òmó ôem
    screenplay_id scene_id
    joy 1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    2 0 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    6 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    7 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    the_social_network 569 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    572 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    573 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 1 0 0 0 0
    574 0 0 0 0 0 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
    575 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

    1846 rows × 4000 columns

    [11]:
     
    VOCAB['doc_count'] = DTM.astype('bool').astype('int').sum()
    DOCS['term_count'] = DTM.sum(1)
    [12]:
     
    DOCS.term_count.describe()
    [12]:
    count    1846.000000
    mean       14.210184
    std        13.311210
    min         0.000000
    25%         4.000000
    50%        11.000000
    75%        21.000000
    max       110.000000
    Name: term_count, dtype: float64
    [13]:
     
    lda_engine = LDA(n_components=n_topics, max_iter=max_iter, learning_offset=50., random_state=0)
    [14]:
     
    TNAMES = [f"T{str(x).zfill(len(str(n_topics)))}" for x in range(n_topics)]
    [15]:
     
    lda_model = lda_engine.fit_transform(count_model)
    ## THETA

    THETA¶

    [16]:
     
    THETA = pd.DataFrame(lda_model, index=DOCS.index)
    THETA.columns.name = 'topic_id'
    THETA.columns = TNAMES
    [17]:
     
    THETA.sample(10).T.style.background_gradient(cmap=colors, axis=None)
    [17]:
    screenplay_id the_big_short the_social_network the_help steve_jobs the_founder steve_jobs the_founder
    scene_id 5 485 418 253 742 655 661 166 810 9
    T00 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T01 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.510390 0.005000
    T02 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T03 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T04 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.860714 0.001471 0.005000
    T05 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T06 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T07 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T08 0.000481 0.434657 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T09 0.000481 0.001786 0.025000 0.005000 0.012500 0.351972 0.003125 0.003571 0.001471 0.005000
    T10 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T11 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T12 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T13 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T14 0.000481 0.497486 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T15 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T16 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T17 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.433728 0.005000
    T18 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T19 0.981250 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T20 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T21 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T22 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T23 0.000481 0.001786 0.025000 0.805000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T24 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T25 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T26 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T27 0.000481 0.001786 0.025000 0.005000 0.512500 0.003571 0.003125 0.003571 0.001471 0.005000
    T28 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T29 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.805000
    T30 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T31 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T32 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T33 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T34 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.878125 0.003571 0.001471 0.005000
    T35 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T36 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T37 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    T38 0.000481 0.001786 0.025000 0.005000 0.012500 0.512314 0.003125 0.003571 0.001471 0.005000
    T39 0.000481 0.001786 0.025000 0.005000 0.012500 0.003571 0.003125 0.003571 0.001471 0.005000
    ## PHI

    PHI¶

    [18]:
     
    PHI = pd.DataFrame(lda_engine.components_, columns=TERMS, index=TNAMES)
    PHI.index.name = 'topic_id'
    PHI.columns.name = 'term_str'
    [19]:
     
    PHI.T.sample(10).style.background_gradient(cmap=colors, axis=None)
    [19]:
    topic_id T00 T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21 T22 T23 T24 T25 T26 T27 T28 T29 T30 T31 T32 T33 T34 T35 T36 T37 T38 T39
    term_str                                                                                
    plan 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 3.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 2.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 3.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 2.025000 0.025000 0.025000 0.025000
    dining 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 1.025000 0.025000 0.025000 0.025000 0.025000 4.025000 0.025000 0.025000 0.025000 1.025000 0.025000 0.025000 0.025000 0.025000 1.025000 0.025000 2.025000 0.025000 0.025000 5.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 2.025000
    jared 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 4.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000
    stories skeeter 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 2.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000
    dress mother 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 2.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000
    pennies heaven 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 2.025000 0.025000 0.025000 0.025000 0.025000 0.025000 4.025000 0.025000
    shots 0.025000 0.025000 0.025000 0.025000 4.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 1.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 1.025000 0.025000 0.025000
    ship date 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 2.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000
    drift busy 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 2.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000
    bags 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 2.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 2.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 1.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 0.025000 1.025000 0.025000
    ## TOPICS

    TOPICS¶

    [20]:
     
    TOPICS = PHI.stack().groupby('topic_id')\
        .apply(lambda x: ' '.join(x.sort_values(ascending=False).head(n_top_terms).reset_index().term_str))\
        .to_frame('top_terms')
    [21]:
     
    TOPICS.head()
    [21]:
    top_terms
    topic_id
    T00 swaps sign place glass loans dont money golf default swaps
    T01 right people book hand vo way arches building deal
    T02 room kitchen living living room people house time way world
    T03 vo school computer people money gonna brother right room
    T04 contd sorry money men youre summers sir skeeter today
    [22]:
     
    TOPICS['doc_weight_sum'] = THETA.sum()
    TOPICS['term_freq'] = PHI.sum(1) / PHI.sum(1).sum()
    [23]:
     
    TOPICS.sort_values('doc_weight_sum', ascending=False).style.background_gradient(cmap=colors)
    [23]:
      top_terms doc_weight_sum term_freq
    topic_id      
    T27 continued lot arches angles parking lot parking cover restaurant things 118.237774 0.025378
    T19 bonds mortgage banks car mortgage bonds housing market mortgages shit 62.955227 0.042527
    T30 computer room moment door skeeter conference hand hands time 57.578945 0.029280
    T25 yule skeeter os door map phone table book desk 55.382689 0.026782
    T09 door board house way man world time bed record 53.224913 0.028538
    T15 people end dress vo day right way business party 52.469261 0.027682
    T20 girls kitchen door room skeeter guys morning nods end 52.465449 0.030366
    T16 os phone time right people ll door way bank 52.104587 0.030608
    T06 thing gonna business people beat time summers youre milkshake 49.394011 0.032061
    T04 contd sorry money men youre summers sir skeeter today 47.866102 0.026326
    T13 gage time team share lot room ownership share ownership computer 47.560548 0.021216
    T39 car living money people time room table bedroom job 47.529435 0.024985
    T07 food lawyer page beat cut face chair drive chicken 47.381610 0.024287
    T23 house beat money os hand line country water mother 46.924828 0.028691
    T31 phone things losses mortgage hand swaps people bond department 46.836768 0.025905
    T37 door stairs race day hallway people room map thats 46.691889 0.023187
    T05 phone os couple hour look years thing car burger 46.445381 0.030379
    T03 vo school computer people money gonna brother right room 46.253560 0.029903
    T02 room kitchen living living room people house time way world 44.456282 0.024038
    T35 door bathroom toilet computer os letter deal bathroom door money 44.089767 0.023956
    T38 door office time piano years people pool bag desks 43.968562 0.025484
    T22 people time lot shirt site theyre way end bet 43.367253 0.029525
    T33 loan yule head purse home officer police house things 43.234263 0.023981
    T32 company time party kind thing phone change building table 42.984995 0.020344
    T12 phone vo cell cell phone office shakes beat milk milk shakes 42.073192 0.027820
    T18 eyes end hand skeeter man kitchen line restaurant town 42.030787 0.028389
    T01 right people book hand vo way arches building deal 42.022960 0.020859
    T10 chicken bun kitchen way room face product smile skeeter 41.701578 0.021272
    T34 arches place house hand vanilla yule things thing pair 40.998628 0.027419
    T11 letter business right time hell money summers table sorry 40.799953 0.026187
    T08 land house beat years bed sound minutes sure ticket 39.527795 0.017747
    T00 swaps sign place glass loans dont money golf default swaps 38.663781 0.026031
    T36 tray table cover smile sight employees glance men time 36.700751 0.022375
    T26 thing idea beat support car skeeter coat silence right 36.357623 0.019120
    T29 line finger people years days food end store restaurant 36.117518 0.016992
    T17 room idea launch slots test people members company hundreds 36.032378 0.020834
    T21 dollars session head drums cream cooler ice cream ice businessman 35.242060 0.018704
    T28 table os room stock screen bottle glass shares people 34.537674 0.018273
    T14 time right stand man dog size left fight office 34.077687 0.020247
    T24 news late cheerleaders things woman store window drinks eduardo 29.711536 0.012302
    [24]:
     
    cols = TOPICS.columns[1:].to_list()
    r = TOPICS[cols].corr().iloc[1,1]
    [25]:
     
    TOPICS.plot.scatter(*cols, title=f"r={r}");
    [26]:
     
    TOPICS.sort_values('doc_weight_sum', ascending=True).plot.barh(y='doc_weight_sum', x='top_terms', figsize=(5, n_topics/2));
    ## LDA + Visualizations

    LDA + Visualizations¶

    [27]:
     
    LIB = pd.read_csv(f'{output_dir}/{data_prefix}-LIB.csv').set_index('screenplay_id')
    [28]:
     
    LIB['title_key'] = LIB.raw_title.str.split(', ').str[0].str.lower()
    [29]:
     
    TITLES = sorted(LIB.title_key.value_counts().index.to_list())
    [30]:
     
    TOPICS[TITLES] = THETA.join(LIB, on='screenplay_id').groupby('title_key')[TNAMES].mean().T
    [31]:
     
    TOPICS[TITLES + ['top_terms']].style.background_gradient(cmap=colors, axis=None)
    [31]:
      joy steve jobs the big short the founder the help the social network top_terms
    topic_id              
    T00 0.026062 0.012994 0.034163 0.029206 0.018490 0.016646 swaps sign place glass loans dont money golf default swaps
    T01 0.014143 0.022036 0.024812 0.017367 0.031259 0.024446 right people book hand vo way arches building deal
    T02 0.024674 0.025316 0.027253 0.023974 0.033112 0.011998 room kitchen living living room people house time way world
    T03 0.011629 0.032663 0.039859 0.011350 0.014803 0.034096 vo school computer people money gonna brother right room
    T04 0.024664 0.020378 0.025004 0.023964 0.024454 0.037505 contd sorry money men youre summers sir skeeter today
    T05 0.027775 0.028449 0.025132 0.028991 0.016570 0.024112 phone os couple hour look years thing car burger
    T06 0.024883 0.029924 0.018924 0.024176 0.013333 0.042491 thing gonna business people beat time summers youre milkshake
    T07 0.032812 0.013282 0.009825 0.035572 0.023684 0.041719 food lawyer page beat cut face chair drive chicken
    T08 0.033579 0.020178 0.019383 0.036120 0.008286 0.017613 land house beat years bed sound minutes sure ticket
    T09 0.014145 0.047853 0.031476 0.015724 0.027932 0.021780 door board house way man world time bed record
    T10 0.024401 0.015546 0.023332 0.023710 0.023875 0.028400 chicken bun kitchen way room face product smile skeeter
    T11 0.022318 0.013515 0.014945 0.023672 0.035683 0.024285 letter business right time hell money summers table sorry
    T12 0.020288 0.008833 0.046225 0.019729 0.025477 0.028899 phone vo cell cell phone office shakes beat milk milk shakes
    T13 0.015877 0.025338 0.016032 0.015461 0.026719 0.045244 gage time team share lot room ownership share ownership computer
    T14 0.022591 0.016669 0.013812 0.021958 0.009169 0.026822 time right stand man dog size left fight office
    T15 0.016215 0.023402 0.048796 0.015788 0.038509 0.030980 people end dress vo day right way business party
    T16 0.020478 0.033898 0.041566 0.019913 0.031401 0.020871 os phone time right people ll door way bank
    T17 0.018474 0.029078 0.013033 0.017975 0.012779 0.018512 room idea launch slots test people members company hundreds
    T18 0.030413 0.010789 0.005931 0.035517 0.041268 0.017908 eyes end hand skeeter man kitchen line restaurant town
    T19 0.035656 0.013304 0.125312 0.034602 0.023848 0.015376 bonds mortgage banks car mortgage bonds housing market mortgages shit
    T20 0.025562 0.020354 0.018860 0.025871 0.047382 0.031439 girls kitchen door room skeeter guys morning nods end
    T21 0.025985 0.009411 0.032583 0.025242 0.017884 0.016087 dollars session head drums cream cooler ice cream ice businessman
    T22 0.010604 0.023082 0.020025 0.010358 0.022144 0.045332 people time lot shirt site theyre way end bet
    T23 0.033485 0.018882 0.031301 0.033236 0.024106 0.020884 house beat money os hand line country water mother
    T24 0.020953 0.011309 0.021547 0.020373 0.006978 0.021231 news late cheerleaders things woman store window drinks eduardo
    T25 0.030883 0.009680 0.013166 0.029983 0.067089 0.032981 yule skeeter os door map phone table book desk
    T26 0.032564 0.015063 0.010384 0.031609 0.020522 0.013597 thing idea beat support car skeeter coat silence right
    T27 0.025960 0.191956 0.009319 0.025218 0.014161 0.023927 continued lot arches angles parking lot parking cover restaurant things
    T28 0.020003 0.023017 0.016693 0.019454 0.011275 0.019501 table os room stock screen bottle glass shares people
    T29 0.028367 0.018597 0.010126 0.027548 0.019728 0.014765 line finger people years days food end store restaurant
    T30 0.029478 0.030842 0.018258 0.028623 0.040379 0.034004 computer room moment door skeeter conference hand hands time
    T31 0.022496 0.015631 0.068883 0.021866 0.017662 0.023845 phone things losses mortgage hand swaps people bond department
    T32 0.025468 0.021402 0.010975 0.024743 0.009370 0.043314 company time party kind thing phone change building table
    T33 0.030723 0.023898 0.007847 0.029828 0.035636 0.011515 loan yule head purse home officer police house things
    T34 0.025884 0.022744 0.009850 0.025145 0.025673 0.021171 arches place house hand vanilla yule things thing pair
    T35 0.029846 0.021929 0.021063 0.028980 0.028423 0.016429 door bathroom toilet computer os letter deal bathroom door money
    T36 0.037666 0.011187 0.009207 0.036547 0.019457 0.014496 tray table cover smile sight employees glance men time
    T37 0.025072 0.022732 0.023155 0.024360 0.029413 0.027064 door stairs race day hallway people room map thats
    T38 0.017601 0.021155 0.024288 0.017130 0.038474 0.022705 door office time piano years people pool bag desks
    T39 0.040322 0.023682 0.017657 0.039117 0.023590 0.016010 car living money people time room table bedroom job
    [32]:
     
    TOPICS['title'] = TOPICS[TITLES].idxmax(1)
    TOPICS.sort_values(['title','doc_weight_sum'], ascending=[True,False]).style.background_gradient(cmap=colors)
    [32]:
      top_terms doc_weight_sum term_freq joy steve jobs the big short the founder the help the social network title
    topic_id                    
    T39 car living money people time room table bedroom job 47.529435 0.024985 0.040322 0.023682 0.017657 0.039117 0.023590 0.016010 joy
    T23 house beat money os hand line country water mother 46.924828 0.028691 0.033485 0.018882 0.031301 0.033236 0.024106 0.020884 joy
    T35 door bathroom toilet computer os letter deal bathroom door money 44.089767 0.023956 0.029846 0.021929 0.021063 0.028980 0.028423 0.016429 joy
    T34 arches place house hand vanilla yule things thing pair 40.998628 0.027419 0.025884 0.022744 0.009850 0.025145 0.025673 0.021171 joy
    T36 tray table cover smile sight employees glance men time 36.700751 0.022375 0.037666 0.011187 0.009207 0.036547 0.019457 0.014496 joy
    T26 thing idea beat support car skeeter coat silence right 36.357623 0.019120 0.032564 0.015063 0.010384 0.031609 0.020522 0.013597 joy
    T29 line finger people years days food end store restaurant 36.117518 0.016992 0.028367 0.018597 0.010126 0.027548 0.019728 0.014765 joy
    T27 continued lot arches angles parking lot parking cover restaurant things 118.237774 0.025378 0.025960 0.191956 0.009319 0.025218 0.014161 0.023927 steve jobs
    T09 door board house way man world time bed record 53.224913 0.028538 0.014145 0.047853 0.031476 0.015724 0.027932 0.021780 steve jobs
    T17 room idea launch slots test people members company hundreds 36.032378 0.020834 0.018474 0.029078 0.013033 0.017975 0.012779 0.018512 steve jobs
    T28 table os room stock screen bottle glass shares people 34.537674 0.018273 0.020003 0.023017 0.016693 0.019454 0.011275 0.019501 steve jobs
    T19 bonds mortgage banks car mortgage bonds housing market mortgages shit 62.955227 0.042527 0.035656 0.013304 0.125312 0.034602 0.023848 0.015376 the big short
    T15 people end dress vo day right way business party 52.469261 0.027682 0.016215 0.023402 0.048796 0.015788 0.038509 0.030980 the big short
    T16 os phone time right people ll door way bank 52.104587 0.030608 0.020478 0.033898 0.041566 0.019913 0.031401 0.020871 the big short
    T31 phone things losses mortgage hand swaps people bond department 46.836768 0.025905 0.022496 0.015631 0.068883 0.021866 0.017662 0.023845 the big short
    T03 vo school computer people money gonna brother right room 46.253560 0.029903 0.011629 0.032663 0.039859 0.011350 0.014803 0.034096 the big short
    T12 phone vo cell cell phone office shakes beat milk milk shakes 42.073192 0.027820 0.020288 0.008833 0.046225 0.019729 0.025477 0.028899 the big short
    T00 swaps sign place glass loans dont money golf default swaps 38.663781 0.026031 0.026062 0.012994 0.034163 0.029206 0.018490 0.016646 the big short
    T21 dollars session head drums cream cooler ice cream ice businessman 35.242060 0.018704 0.025985 0.009411 0.032583 0.025242 0.017884 0.016087 the big short
    T24 news late cheerleaders things woman store window drinks eduardo 29.711536 0.012302 0.020953 0.011309 0.021547 0.020373 0.006978 0.021231 the big short
    T05 phone os couple hour look years thing car burger 46.445381 0.030379 0.027775 0.028449 0.025132 0.028991 0.016570 0.024112 the founder
    T08 land house beat years bed sound minutes sure ticket 39.527795 0.017747 0.033579 0.020178 0.019383 0.036120 0.008286 0.017613 the founder
    T30 computer room moment door skeeter conference hand hands time 57.578945 0.029280 0.029478 0.030842 0.018258 0.028623 0.040379 0.034004 the help
    T25 yule skeeter os door map phone table book desk 55.382689 0.026782 0.030883 0.009680 0.013166 0.029983 0.067089 0.032981 the help
    T20 girls kitchen door room skeeter guys morning nods end 52.465449 0.030366 0.025562 0.020354 0.018860 0.025871 0.047382 0.031439 the help
    T37 door stairs race day hallway people room map thats 46.691889 0.023187 0.025072 0.022732 0.023155 0.024360 0.029413 0.027064 the help
    T02 room kitchen living living room people house time way world 44.456282 0.024038 0.024674 0.025316 0.027253 0.023974 0.033112 0.011998 the help
    T38 door office time piano years people pool bag desks 43.968562 0.025484 0.017601 0.021155 0.024288 0.017130 0.038474 0.022705 the help
    T33 loan yule head purse home officer police house things 43.234263 0.023981 0.030723 0.023898 0.007847 0.029828 0.035636 0.011515 the help
    T18 eyes end hand skeeter man kitchen line restaurant town 42.030787 0.028389 0.030413 0.010789 0.005931 0.035517 0.041268 0.017908 the help
    T01 right people book hand vo way arches building deal 42.022960 0.020859 0.014143 0.022036 0.024812 0.017367 0.031259 0.024446 the help
    T11 letter business right time hell money summers table sorry 40.799953 0.026187 0.022318 0.013515 0.014945 0.023672 0.035683 0.024285 the help
    T06 thing gonna business people beat time summers youre milkshake 49.394011 0.032061 0.024883 0.029924 0.018924 0.024176 0.013333 0.042491 the social network
    T04 contd sorry money men youre summers sir skeeter today 47.866102 0.026326 0.024664 0.020378 0.025004 0.023964 0.024454 0.037505 the social network
    T13 gage time team share lot room ownership share ownership computer 47.560548 0.021216 0.015877 0.025338 0.016032 0.015461 0.026719 0.045244 the social network
    T07 food lawyer page beat cut face chair drive chicken 47.381610 0.024287 0.032812 0.013282 0.009825 0.035572 0.023684 0.041719 the social network
    T22 people time lot shirt site theyre way end bet 43.367253 0.029525 0.010604 0.023082 0.020025 0.010358 0.022144 0.045332 the social network
    T32 company time party kind thing phone change building table 42.984995 0.020344 0.025468 0.021402 0.010975 0.024743 0.009370 0.043314 the social network
    T10 chicken bun kitchen way room face product smile skeeter 41.701578 0.021272 0.024401 0.015546 0.023332 0.023710 0.023875 0.028400 the social network
    T14 time right stand man dog size left fight office 34.077687 0.020247 0.022591 0.016669 0.013812 0.021958 0.009169 0.026822 the social network
    [33]:
     
    from scipy.spatial.distance import pdist
    [34]:
     
    tpairs_idx = [(a, b) for a, b in pd.MultiIndex.from_product([TOPICS.index, TOPICS.index]) if a < b]
    [35]:
     
    TPAIRS = pd.DataFrame(tpairs_idx, columns=['topic_id_x', 'topic_id_y']).set_index(['topic_id_x', 'topic_id_y'])
    [36]:
     
    TPAIRS['theta_cityblock'] = pdist(THETA.T, 'cityblock')
    TPAIRS['theta_cosine'] = pdist(THETA.T, 'cosine')
    TPAIRS['theta_canberra'] = pdist(THETA.T, 'canberra')
    TPAIRS['theta_jaccard'] = pdist(THETA.T, 'jaccard')
    TPAIRS['theta_js'] = pdist(THETA.T, 'jensenshannon')
    [37]:
     
    TPAIRS['phi_cityblock'] = pdist(PHI, 'cityblock')
    TPAIRS['phi_cosine'] = pdist(PHI, 'cosine')
    TPAIRS['phi_canberra'] = pdist(PHI, 'canberra')
    TPAIRS['phi_jaccard'] = pdist(PHI, 'jaccard')
    TPAIRS['phi_js'] = pdist(PHI, 'jensenshannon')
    [38]:
     
    import pandas as pd
    import numpy as np
    import plotly_express as px
    import seaborn as sns; sns.set()
    sns.pairplot(TPAIRS);
    ## PHI PCA

    PHI PCA¶

    [39]:
     
    pca_engine_phi = PCA(4)
    PHI_COMPS = pd.DataFrame(pca_engine_phi.fit_transform(normalize(PHI, norm='l2', axis=1)), index=PHI.index)
    TOPICS
    [39]:
    top_terms doc_weight_sum term_freq joy steve jobs the big short the founder the help the social network title
    topic_id
    T00 swaps sign place glass loans dont money golf default swaps 38.663781 0.026031 0.026062 0.012994 0.034163 0.029206 0.018490 0.016646 the big short
    T01 right people book hand vo way arches building deal 42.022960 0.020859 0.014143 0.022036 0.024812 0.017367 0.031259 0.024446 the help
    T02 room kitchen living living room people house time way world 44.456282 0.024038 0.024674 0.025316 0.027253 0.023974 0.033112 0.011998 the help
    T03 vo school computer people money gonna brother right room 46.253560 0.029903 0.011629 0.032663 0.039859 0.011350 0.014803 0.034096 the big short
    T04 contd sorry money men youre summers sir skeeter today 47.866102 0.026326 0.024664 0.020378 0.025004 0.023964 0.024454 0.037505 the social network
    T05 phone os couple hour look years thing car burger 46.445381 0.030379 0.027775 0.028449 0.025132 0.028991 0.016570 0.024112 the founder
    T06 thing gonna business people beat time summers youre milkshake 49.394011 0.032061 0.024883 0.029924 0.018924 0.024176 0.013333 0.042491 the social network
    T07 food lawyer page beat cut face chair drive chicken 47.381610 0.024287 0.032812 0.013282 0.009825 0.035572 0.023684 0.041719 the social network
    T08 land house beat years bed sound minutes sure ticket 39.527795 0.017747 0.033579 0.020178 0.019383 0.036120 0.008286 0.017613 the founder
    T09 door board house way man world time bed record 53.224913 0.028538 0.014145 0.047853 0.031476 0.015724 0.027932 0.021780 steve jobs
    T10 chicken bun kitchen way room face product smile skeeter 41.701578 0.021272 0.024401 0.015546 0.023332 0.023710 0.023875 0.028400 the social network
    T11 letter business right time hell money summers table sorry 40.799953 0.026187 0.022318 0.013515 0.014945 0.023672 0.035683 0.024285 the help
    T12 phone vo cell cell phone office shakes beat milk milk shakes 42.073192 0.027820 0.020288 0.008833 0.046225 0.019729 0.025477 0.028899 the big short
    T13 gage time team share lot room ownership share ownership computer 47.560548 0.021216 0.015877 0.025338 0.016032 0.015461 0.026719 0.045244 the social network
    T14 time right stand man dog size left fight office 34.077687 0.020247 0.022591 0.016669 0.013812 0.021958 0.009169 0.026822 the social network
    T15 people end dress vo day right way business party 52.469261 0.027682 0.016215 0.023402 0.048796 0.015788 0.038509 0.030980 the big short
    T16 os phone time right people ll door way bank 52.104587 0.030608 0.020478 0.033898 0.041566 0.019913 0.031401 0.020871 the big short
    T17 room idea launch slots test people members company hundreds 36.032378 0.020834 0.018474 0.029078 0.013033 0.017975 0.012779 0.018512 steve jobs
    T18 eyes end hand skeeter man kitchen line restaurant town 42.030787 0.028389 0.030413 0.010789 0.005931 0.035517 0.041268 0.017908 the help
    T19 bonds mortgage banks car mortgage bonds housing market mortgages shit 62.955227 0.042527 0.035656 0.013304 0.125312 0.034602 0.023848 0.015376 the big short
    T20 girls kitchen door room skeeter guys morning nods end 52.465449 0.030366 0.025562 0.020354 0.018860 0.025871 0.047382 0.031439 the help
    T21 dollars session head drums cream cooler ice cream ice businessman 35.242060 0.018704 0.025985 0.009411 0.032583 0.025242 0.017884 0.016087 the big short
    T22 people time lot shirt site theyre way end bet 43.367253 0.029525 0.010604 0.023082 0.020025 0.010358 0.022144 0.045332 the social network
    T23 house beat money os hand line country water mother 46.924828 0.028691 0.033485 0.018882 0.031301 0.033236 0.024106 0.020884 joy
    T24 news late cheerleaders things woman store window drinks eduardo 29.711536 0.012302 0.020953 0.011309 0.021547 0.020373 0.006978 0.021231 the big short
    T25 yule skeeter os door map phone table book desk 55.382689 0.026782 0.030883 0.009680 0.013166 0.029983 0.067089 0.032981 the help
    T26 thing idea beat support car skeeter coat silence right 36.357623 0.019120 0.032564 0.015063 0.010384 0.031609 0.020522 0.013597 joy
    T27 continued lot arches angles parking lot parking cover restaurant things 118.237774 0.025378 0.025960 0.191956 0.009319 0.025218 0.014161 0.023927 steve jobs
    T28 table os room stock screen bottle glass shares people 34.537674 0.018273 0.020003 0.023017 0.016693 0.019454 0.011275 0.019501 steve jobs
    T29 line finger people years days food end store restaurant 36.117518 0.016992 0.028367 0.018597 0.010126 0.027548 0.019728 0.014765 joy
    T30 computer room moment door skeeter conference hand hands time 57.578945 0.029280 0.029478 0.030842 0.018258 0.028623 0.040379 0.034004 the help
    T31 phone things losses mortgage hand swaps people bond department 46.836768 0.025905 0.022496 0.015631 0.068883 0.021866 0.017662 0.023845 the big short
    T32 company time party kind thing phone change building table 42.984995 0.020344 0.025468 0.021402 0.010975 0.024743 0.009370 0.043314 the social network
    T33 loan yule head purse home officer police house things 43.234263 0.023981 0.030723 0.023898 0.007847 0.029828 0.035636 0.011515 the help
    T34 arches place house hand vanilla yule things thing pair 40.998628 0.027419 0.025884 0.022744 0.009850 0.025145 0.025673 0.021171 joy
    T35 door bathroom toilet computer os letter deal bathroom door money 44.089767 0.023956 0.029846 0.021929 0.021063 0.028980 0.028423 0.016429 joy
    T36 tray table cover smile sight employees glance men time 36.700751 0.022375 0.037666 0.011187 0.009207 0.036547 0.019457 0.014496 joy
    T37 door stairs race day hallway people room map thats 46.691889 0.023187 0.025072 0.022732 0.023155 0.024360 0.029413 0.027064 the help
    T38 door office time piano years people pool bag desks 43.968562 0.025484 0.017601 0.021155 0.024288 0.017130 0.038474 0.022705 the help
    T39 car living money people time room table bedroom job 47.529435 0.024985 0.040322 0.023682 0.017657 0.039117 0.023590 0.016010 joy
    [52]:
     
    import matplotlib.pyplot as plt
    import seaborn as sns
    ​
    # Prepare data
    df = PHI_COMPS.reset_index().copy()
    df['size'] = TOPICS['term_freq'].values
    df['color'] = TOPICS['title'].values
    df['hover'] = TOPICS['doc_weight_sum'].values
    ​
    plt.figure(figsize=(10, 8))
    ​
    # Create scatter plot
    sns.scatterplot(
        data=df, x=0, y=1,
        hue='color', size='size',
        sizes=(20, 400), alpha=0.6,
        legend='brief'
    )
    ​
    # Add topic_id labels near points
    for _, row in df.iterrows():
        plt.text(row[0]+0.1, row[1]+0.1, str(row['topic_id']), fontsize=9)
    ​
    plt.xlabel("Component 0")
    plt.ylabel("Component 1")
    plt.title("Topic Scatter by Term Frequency and Document Weight")
    plt.tight_layout()
    plt.show()
    [40]:
     
    px.scatter(PHI_COMPS.reset_index(), 0, 1, 
               size=TOPICS.term_freq, 
               color=TOPICS.title, 
               text='topic_id', hover_name=TOPICS.doc_weight_sum, height=600, width=700)
    −10123456−101234
    01
    plotly-logomark
    [41]:
     
    PHI_LOADINGS = pd.DataFrame(pca_engine_phi.components_.T * np.sqrt(pca_engine_phi.explained_variance_), index=PHI.T.index)
    PHI_LOADINGS.index.name = 'term_str'
    [54]:
     
    import matplotlib.pyplot as plt
    import seaborn as sns
    ​
    # Prepare data
    df = PHI_LOADINGS.reset_index()
    ​
    plt.figure(figsize=(10, 8))  # Approx 600x700 pixels
    ​
    # Plot points
    sns.scatterplot(data=df, x=0, y=1, alpha=0.7)
    ​
    # Add term labels
    for _, row in df.iterrows():
        plt.text(row[0]+0.01, row[1]+0.01, row['term_str'], fontsize=8)
    ​
    # Axes and layout
    plt.xlabel("Component 0")
    plt.ylabel("Component 1")
    plt.title("Term Loadings on Topic Components")
    plt.tight_layout()
    plt.show()
    [42]:
     
    px.scatter(PHI_LOADINGS.reset_index(), 0, 1, text='term_str', height=600, width=700)
    −10123456−101234
    01
    plotly-logomark
    [43]:
     
    pca_engine_theta = PCA(5)
    [44]:
     
    THETA_COMPS = pd.DataFrame(pca_engine_theta.fit_transform(normalize(THETA.T.values, norm='l2', axis=1)), index=THETA.T.index)
    THETA_COMPS.index.name = 'topic_id'
    [55]:
     
    import matplotlib.pyplot as plt
    import seaborn as sns
    ​
    # Merge or concatenate if needed — assume THETA_COMPS and TOPICS are aligned by index
    df = THETA_COMPS.reset_index().copy()
    df['title'] = TOPICS.title.values
    df['doc_weight_sum'] = TOPICS.doc_weight_sum.values
    df['topic_id'] = TOPICS.index if 'topic_id' not in df.columns else df['topic_id']
    ​
    plt.figure(figsize=(10, 8))  # Approx 700x600
    ​
    # Scatterplot with point size and color
    scatter = sns.scatterplot(
        data=df, x=2, y=3, 
        size='doc_weight_sum', hue='title',
        legend=False, alpha=0.7
    )
    ​
    # Add topic ID labels to points
    for _, row in df.iterrows():
        plt.text(row[2]+0.1, row[3]+0.1, str(row['topic_id']), fontsize=8)
    ​
    # Formatting
    plt.xlabel("Component 2")
    plt.ylabel("Component 3")
    plt.title("Topic Distribution: THETA Components")
    plt.tight_layout()
    plt.show()
    [45]:
    x
    px.scatter(THETA_COMPS.reset_index(), 2, 3, 
               size=TOPICS.doc_weight_sum, color=TOPICS.title, 
               text='topic_id', hover_name=TOPICS.title, 
               height=600, width=700)
    −10123456−101234
    23
    plotly-logomark
    [46]:
     
    THETA_LOADINGS = pd.DataFrame(pca_engine_theta.components_.T * np.sqrt(pca_engine_theta.explained_variance_), index=THETA.index)
    DOCS = pd.DataFrame(DOCS)
    DOCS
    [46]:
    doc_str term_count
    screenplay_id scene_id
    joy 1 kitchen drive restaurant its 5
    2 sample pitch thinking heck spindle for shakes spindleó wrong notion chicken egg here milk shakes milk shakes latter customers order shake establishment its wait before again brand drive motor ability milk shakes mark dollars suckers stick at demand egg logic course bright fella idea beat say thoughtfully anyway 67
    4 car trunk back 3
    6 car sales watch its 3
    7 car customer spot front vast assortment items beef sandwiches tamales peanut butter chili dogs etc 10
    ... ... ... ...
    the_social_network 569 cool control panic control someone move is news now you no them somebody somebody coke cause there right ice home phone shut moment package desk earlier paper wrapping box box brand business cards business cards it womans voice vo mark 33
    572 conference room one left voice lights skyline picture windows her day yeah here mark company day salad something guy testimony myths devil now others steak office settlement agreement gonna settle 22
    573 yeah extra guys disclosure agreement word wife kids jury jury selection jury sees defendant hair likability practice law months jury story chicken werent sorority party night one police question it youve jury minutes animals drunk stupid blogging blogging 33
    574 them scheme things speeding ticket anybody computer minute problem help asshole youre coat briefcase exits computer name search box name picture 07 smiles mouse forth boxes fiadd box request friend clicks homepage waits response hits settlement dollars disclosure agreement sixth hits settlement name masthead founder 35
    575 chair night to members countries dollars billionaire world waits waits 7

    1846 rows × 2 columns

    [47]:
     
    DOCS['doc_label'] = DOCS.apply(lambda x: f"{LIB.loc[x.name[0]].raw_title}-{x.name[1]}", axis=1)
    DOCS['screen_play'] = DOCS.apply(lambda x: f"{LIB.loc[x.name[0]].raw_title}", axis=1)
    DOCS['n_chars'] = DOCS.doc_str.str.len()
    [56]:
     
    import matplotlib.pyplot as plt
    import seaborn as sns
    ​
    # Combine THETA_LOADINGS with DOCS metadata
    df = THETA_LOADINGS.reset_index().copy()
    df['screen_play'] = DOCS['screen_play'].values  # assumes alignment by index
    ​
    plt.figure(figsize=(12, 8))  # approx 900x600
    ​
    # Scatterplot with color by screen_play
    sns.scatterplot(
        data=df, x=0, y=1,
        hue='screen_play', palette='tab10', alpha=0.7
    )
    ​
    plt.xlabel("Component 0")
    plt.ylabel("Component 1")
    plt.title("THETA Loadings by Screenplay")
    plt.legend(title='Screenplay', bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.tight_layout()
    plt.show()
    [48]:
     
    px.scatter(THETA_LOADINGS.reset_index(), 0, 1, 
               # size=DOCS.n_chars, 
               color=DOCS.screen_play, 
               height=600, width=900)
    −10123456−101234
    01
    plotly-logomark
    [59]:
    xxxxxxxxxx
     
    PHI_LOADINGS.head()
    [59]:
    0 1 2 3
    term_str
    05 -0.001056 -0.000118 -0.000036 0.003184
    1350000 -0.002852 0.002178 0.000811 -0.001873
    aback -0.000044 -0.001774 0.001317 0.000709
    ability 0.004477 0.002495 -0.001235 0.001143
    absorbing 0.002355 -0.001708 0.003512 -0.000382
    [60]:
    xxxxxxxxxx
     
    THETA_LOADINGS
    [60]:
    0 1 2 3 4
    screenplay_id scene_id
    joy 1 -0.002304 -0.003213 0.005371 0.003894 -0.006513
    2 0.002661 -0.000563 0.000541 0.002004 0.013783
    4 0.009537 0.000668 0.000696 -0.000457 -0.002512
    6 0.009537 0.000668 0.000696 -0.000457 -0.002512
    7 0.001526 0.007347 0.002287 0.001414 0.001315
    ... ... ... ... ... ... ...
    the_social_network 569 0.003603 -0.000761 -0.004378 -0.000921 0.009922
    572 0.002200 -0.001178 -0.001982 0.004720 0.001502
    573 -0.002736 -0.009356 -0.000108 -0.001901 -0.002036
    574 -0.004560 0.001273 0.000453 -0.008940 0.001517
    575 0.003153 0.001361 -0.001179 0.001994 -0.000507

    1846 rows × 5 columns

    [61]:
     
    DOCS
    [61]:
    doc_str term_count doc_label screen_play n_chars
    screenplay_id scene_id
    joy 1 kitchen drive restaurant its 5 Joy-1 Joy 28
    2 sample pitch thinking heck spindle for shakes spindleó wrong notion chicken egg here milk shakes milk shakes latter customers order shake establishment its wait before again brand drive motor ability milk shakes mark dollars suckers stick at demand egg logic course bright fella idea beat say thoughtfully anyway 67 Joy-2 Joy 312
    4 car trunk back 3 Joy-4 Joy 14
    6 car sales watch its 3 Joy-6 Joy 19
    7 car customer spot front vast assortment items beef sandwiches tamales peanut butter chili dogs etc 10 Joy-7 Joy 98
    ... ... ... ... ... ... ...
    the_social_network 569 cool control panic control someone move is news now you no them somebody somebody coke cause there right ice home phone shut moment package desk earlier paper wrapping box box brand business cards business cards it womans voice vo mark 33 The Social Network-569 The Social Network 235
    572 conference room one left voice lights skyline picture windows her day yeah here mark company day salad something guy testimony myths devil now others steak office settlement agreement gonna settle 22 The Social Network-572 The Social Network 196
    573 yeah extra guys disclosure agreement word wife kids jury jury selection jury sees defendant hair likability practice law months jury story chicken werent sorority party night one police question it youve jury minutes animals drunk stupid blogging blogging 33 The Social Network-573 The Social Network 255
    574 them scheme things speeding ticket anybody computer minute problem help asshole youre coat briefcase exits computer name search box name picture 07 smiles mouse forth boxes fiadd box request friend clicks homepage waits response hits settlement dollars disclosure agreement sixth hits settlement name masthead founder 35 The Social Network-574 The Social Network 316
    575 chair night to members countries dollars billionaire world waits waits 7 The Social Network-575 The Social Network 70

    1846 rows × 5 columns

    xxxxxxxxxx

    Second Two Topics¶

    [74]:
     
    import matplotlib.pyplot as plt
    import seaborn as sns
    ​
    # Step 1: Get mean topic weight across documents
    topic_mean_weight = THETA.mean(axis=0)  # Series with index T00, T01, ..., T39
    ​
    # Step 2: Flatten PHI_LOADINGS into long format
    phi_long = PHI.reset_index().melt(id_vars='topic_id', var_name='term_str', value_name='phi_weight')
    ​
    # Step 3: Merge mean topic weight into PHI table
    phi_long['mean_doc_weight'] = phi_long['topic_id'].map(topic_mean_weight)
    ​
    # Step 4: Run dimensionality reduction (e.g., PCA) to get x and y coords
    from sklearn.decomposition import PCA
    ​
    phi_matrix = PHI.values
    pca = PCA(n_components=2)
    xy = pca.fit_transform(phi_matrix)
    ​
    # Create topic -> (x, y) map
    topic_xy = pd.DataFrame(xy, columns=['x', 'y'], index=PHI.index)
    ​
    # Step 5: Merge x, y into phi_long
    phi_long = phi_long.merge(topic_xy, left_on='topic_id', right_index=True)
    ​
    # Step 6: Plot with seaborn + matplotlib
    plt.figure(figsize=(10, 8))
    sns.scatterplot(
        data=phi_long,
        x='x', y='y',
        size='mean_doc_weight',
        sizes=(20, 300),
        alpha=0.6,
        legend=False
    )
    ​
    # Optional: Label each topic
    for tid, row in topic_xy.iterrows():
        plt.text(row['x'], row['y'], tid, fontsize=10, ha='center', va='bottom')
    ​
    plt.xlabel("PCA Component 1")
    plt.ylabel("PCA Component 2")
    plt.title("Topics in PHI Loadings (sized by mean document weight)")
    plt.tight_layout()
    plt.show()
    [71]:
     
    import matplotlib.pyplot as plt
    import seaborn as sns
    ​
    # Step 1: Compute mean doc weight per topic (assuming THETA columns are topics)
    term_mean_weight = THETA.mean(axis=0)  # Series indexed by topic_id
    ​
    # Step 2: Join to PHI_LOADINGS on topic_id
    df = PHI.reset_index().copy()
    df['mean_doc_weight'] = df['topic_id'].map(term_mean_weight)
    ​
    # Step 3: Plot with size proportional to mean_doc_weight
    plt.figure(figsize=(10, 8))
    ​
    sns.scatterplot(
        data=df,
        x=2, y=3,
        size='mean_doc_weight',
        sizes=(20, 200),  # Adjust min and max dot sizes
        alpha=0.7,
        legend=False
    )
    ​
    # Add text labels
    for _, row in df.iterrows():
        plt.text(row[2], row[3], row['term_str'], fontsize=9, ha='center', va='bottom')
    ​
    plt.xlabel("Component 2")
    plt.ylabel("Component 3")
    plt.title("PHI Loadings (Component 2 vs 3), Sized by Mean Document Weight")
    plt.tight_layout()
    plt.show()
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    File ~/.local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
       3804 try:
    -> 3805     return self._engine.get_loc(casted_key)
       3806 except KeyError as err:
    
    File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()
    
    File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()
    
    File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()
    
    File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()
    
    KeyError: 'term_str'
    
    The above exception was the direct cause of the following exception:
    
    KeyError                                  Traceback (most recent call last)
    Cell In[71], line 25
         23 # Add text labels
         24 for _, row in df.iterrows():
    ---> 25     plt.text(row[2], row[3], row['term_str'], fontsize=9, ha='center', va='bottom')
         27 plt.xlabel("Component 2")
         28 plt.ylabel("Component 3")
    
    File ~/.local/lib/python3.11/site-packages/pandas/core/series.py:1121, in Series.__getitem__(self, key)
       1118     return self._values[key]
       1120 elif key_is_scalar:
    -> 1121     return self._get_value(key)
       1123 # Convert generator to list before going through hashable part
       1124 # (We will iterate through the generator there to check for slices)
       1125 if is_iterator(key):
    
    File ~/.local/lib/python3.11/site-packages/pandas/core/series.py:1237, in Series._get_value(self, label, takeable)
       1234     return self._values[label]
       1236 # Similar to Index.get_value, but we do not fall back to positional
    -> 1237 loc = self.index.get_loc(label)
       1239 if is_integer(loc):
       1240     return self._values[loc]
    
    File ~/.local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
       3807     if isinstance(casted_key, slice) or (
       3808         isinstance(casted_key, abc.Iterable)
       3809         and any(isinstance(x, slice) for x in casted_key)
       3810     ):
       3811         raise InvalidIndexError(key)
    -> 3812     raise KeyError(key) from err
       3813 except TypeError:
       3814     # If we have a listlike key, _check_indexing_error will raise
       3815     #  InvalidIndexError. Otherwise we fall through and re-raise
       3816     #  the TypeError.
       3817     self._check_indexing_error(key)
    
    KeyError: 'term_str'
    [57]:
     
    import matplotlib.pyplot as plt
    import seaborn as sns
    ​
    # Prepare DataFrame
    df = PHI_LOADINGS.reset_index()
    ​
    plt.figure(figsize=(10, 8))  # Approx 700x600 in pixels
    ​
    # Basic scatterplot
    sns.scatterplot(data=df, x=2, y=3, alpha=0.7)
    ​
    # Add term labels on top of each point
    for _, row in df.iterrows():
        plt.text(row[2], row[3], row['term_str'], fontsize=9, ha='center', va='bottom')
    ​
    plt.xlabel("Component 2")
    plt.ylabel("Component 3")
    plt.title("PHI Loadings (Component 2 vs 3)")
    plt.tight_layout()
    plt.show()
    [72]:
    x
    THETA
    [72]:
    topic_id T00 T01 T02 T03 T04 T05 T06 T07 T08 T09 ... T30 T31 T32 T33 T34 T35 T36 T37 T38 T39
    screenplay_id scene_id
    joy 1 0.004167 0.004167 0.004167 0.004167 0.004167 0.004167 0.004167 0.004167 0.004167 0.004167 ... 0.004167 0.004167 0.004167 0.004167 0.004167 0.004167 0.004167 0.004167 0.004167 0.004167
    2 0.000368 0.000368 0.000368 0.000368 0.000368 0.000368 0.000368 0.000368 0.000368 0.000368 ... 0.000368 0.000368 0.000368 0.000368 0.000368 0.000368 0.000368 0.000368 0.000368 0.000368
    4 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 ... 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250
    6 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 ... 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250 0.006250
    7 0.002273 0.002273 0.002273 0.002273 0.002273 0.002273 0.002273 0.911364 0.002273 0.002273 ... 0.002273 0.002273 0.002273 0.002273 0.002273 0.002273 0.002273 0.002273 0.002273 0.002273
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    the_social_network 569 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735 ... 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735 0.971324 0.000735 0.000735
    572 0.001087 0.001087 0.001087 0.001087 0.001087 0.001087 0.001087 0.001087 0.001087 0.001087 ... 0.001087 0.001087 0.001087 0.001087 0.001087 0.001087 0.001087 0.001087 0.001087 0.001087
    573 0.000735 0.000735 0.000735 0.971324 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735 ... 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735 0.000735
    574 0.000694 0.000694 0.000694 0.000694 0.000694 0.000694 0.000694 0.000694 0.000694 0.000694 ... 0.000694 0.000694 0.000694 0.000694 0.000694 0.000694 0.000694 0.000694 0.000694 0.000694
    575 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 ... 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125

    1846 rows × 40 columns

    [49]:
     
    px.scatter(PHI_LOADINGS.reset_index(), 2, 3, text='term_str', height=600, width=700)
    −10123456−101234
    23
    plotly-logomark
    [66]:
    xxxxxxxxxx
     
    import matplotlib.pyplot as plt
    import seaborn as sns
    ​
    # Prepare DataFrame
    df = THETA_LOADINGS.reset_index().copy()
    df['screen_play'] = df['screenplay_id'].map(DOCS['screen_play'])  # ensure this maps properly
    ​
    plt.figure(figsize=(12, 8))  # Approx 900x600 in pixels
    ​
    # Scatter plot with color by screen_play
    sns.scatterplot(data=df, x=2, y=3, hue='screenplay_id', palette='tab10', s=60, alpha=0.8)
    ​
    plt.xlabel("Component 02")
    plt.ylabel("Component 03")
    plt.title("THETA Loadings Colored by Screenplay")
    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.tight_layout()
    plt.show()
    df
    [66]:
    screenplay_id scene_id 0 1 2 3 4 screen_play
    0 joy 1 -0.002304 -0.003213 0.005371 0.003894 -0.006513 NaN
    1 joy 2 0.002661 -0.000563 0.000541 0.002004 0.013783 NaN
    2 joy 4 0.009537 0.000668 0.000696 -0.000457 -0.002512 NaN
    3 joy 6 0.009537 0.000668 0.000696 -0.000457 -0.002512 NaN
    4 joy 7 0.001526 0.007347 0.002287 0.001414 0.001315 NaN
    ... ... ... ... ... ... ... ... ...
    1841 the_social_network 569 0.003603 -0.000761 -0.004378 -0.000921 0.009922 NaN
    1842 the_social_network 572 0.002200 -0.001178 -0.001982 0.004720 0.001502 NaN
    1843 the_social_network 573 -0.002736 -0.009356 -0.000108 -0.001901 -0.002036 NaN
    1844 the_social_network 574 -0.004560 0.001273 0.000453 -0.008940 0.001517 NaN
    1845 the_social_network 575 0.003153 0.001361 -0.001179 0.001994 -0.000507 NaN

    1846 rows × 8 columns

    [50]:
     
    px.scatter(THETA_LOADINGS.reset_index(), 2, 3, 
               # size=DOCS.n_chars, 
               color=DOCS.screen_play, 
               height=600, width=900)
    −10123456−101234
    23
    plotly-logomark
    [51]:
     
    TPAIRS.to_csv(f"{output_dir}/{data_prefix}-TOPICPAIRS-{n_topics}.csv", index=True)
    LIB.to_csv(f"{output_dir}/{data_prefix}-LIB-KEY.csv", index=True)
    VOCAB.to_csv(f"{output_dir}/{data_prefix}-VOCAB2.csv", index=True)
    THETA_LOADINGS.to_csv(f"{output_dir}/{data_prefix}-THETA_LOADINGS.csv", index=True)
    THETA_COMPS.to_csv(f"{output_dir}/{data_prefix}-THETA_COMPS.csv", index=True)
    [ ]:
     
    ​
    Kernel status: Idle
    xxxxxxxxxx

    Final Project Notebook¶

    DS 5001 Text as Data | Spring 2025

    xxxxxxxxxx
    # Metadata

    Metadata¶

    • Full Name: Gabriella Cordelli
    • Userid:
    • GitHub Repo URL: https://github.com/GEMcordelli/Text-Analytics-Project-Digital-Analytical-Addition
    • UVA Box URL:
    xxxxxxxxxx
    # Overview

    Overview¶

    The goal of the final project is for you to create a digital analytical edition of a corpus using the tools, practices, and perspectives you’ve learning in this course. You will select a corpus that has already been digitized and transcribed, parse that into an F-compliant set of tables, and then generate and visualize the results of a series of fitted models. You will also draw some tentative conclusions regarding the linguistic, cultural, psychological, or historical features represented by your corpus. The point of the exercise is to have you work with a corpus through the entire pipeline from ingestion to interpretation.

    Specifically, you will acquire a collection of long-form texts and perform the following operations:

    • Convert the collection from their source formats (F0) into a set of tables that conform to the Standard Text Analytic Data Model (F2).
    • Annotate these tables with statistical and linguistic features using NLP libraries such as NLTK (F3).
    • Produce a vector representation of the corpus to generate TFIDF values to add to the TOKEN (aka CORPUS) and VOCAB tables (F4).
    • Model the annotated and vectorized model with tables and features derived from the application of unsupervised methods, including PCA, LDA, and word2vec (F5).
    • Explore your results using statistical and visual methods.
    • Present conclusions about patterns observed in the corpus by means of these operations.

    When you are finished, you will make the results of your work available in GitHub (for code) and UVA Box (for data). You will submit to Gradescope (via Canvas) a PDF version of a Jupyter notebook that contains the information listed below.

    Some Details¶

    • Please fill out your answers in each task below by editing the markdown cell.
    • Replace text that asks you to insert something with the thing, i.e. replace (INSERT IMAGE HERE) with an image element, e.g. ![](image.png).
    • For URLs, just paste the raw URL directly into the text area. Don't worry about providing link labels using [label](link).
    • Please do not alter the structure of the document or cell, i.e. the bulleted lists.
    • You may add explanatory paragraphs below the bulleted lists.
    • Please name your tables as they are named in each task below.
    • Tasks are indicated by headers with point values in parentheses.
    xxxxxxxxxx
    # Raw Data

    Raw Data¶

    xxxxxxxxxx
    ## Source Description (1)

    Source Description (1)¶

    xxxxxxxxxx
    Provide a brief description of your source material, including its provenance and content. Tell us where you found it and what kind of content it contains.

    Provide a brief description of your source material, including its provenance and content. Tell us where you found it and what kind of content it contains.

    (INSERT DESCRIPTION HERE)

    xxxxxxxxxx
    ## Source Features (1)

    Source Features (1)¶

    Add values for the following items. (Do this for all following bulleted lists.)

    • Source URL:
    • UVA Box URL:
    • Number of raw documents:
    • Total size of raw documents (e.g. in MB):
    • File format(s), e.g. XML, plaintext, etc.:
    xxxxxxxxxx
    ## Source Document Structure (1)

    Source Document Structure (1)¶

    Provide a brief description of the internal structure of each document. That, describe the typical elements found in document and their relation to each other. For example, a corpus of letters might be described as having a date, an addressee, a salutation, a set of content paragraphs, and closing. If they are various structures, state that.

    (INSERT DESCRIPTION HERE)

    xxxxxxxxxx
    # Parsed and Annotated Data

    Parsed and Annotated Data¶

    Parse the raw data into the three core tables of your addition: the LIB, CORPUS, and VOCAB tables.

    These tables will be stored as CSV files with header rows.

    You may consider using | as a delimitter.

    Provide the following information for each.

    xxxxxxxxxx
    ## LIB (2)

    LIB (2)¶

    The source documents the corpus comprises. These may be books, plays, newspaper articles, abstracts, blog posts, etc.

    Note that these are not documents in the sense used to describe a bag-of-words representation of a text, e.g. chapter.

    • UVA Box URL:
    • GitHub URL for notebook used to create:
    • Delimitter:
    • Number of observations:
    • List of features, including at least three that may be used for model summarization (e.g. date, author, etc.):
    • Average length of each document in characters:
    xxxxxxxxxx
    ## CORPUS (2)

    CORPUS (2)¶

    The sequence of word tokens in the corpus, indexed by their location in the corpus and document structures.

    • UVA Box URL:
    • GitHub URL for notebook used to create:
    • Delimitter:
    • Number of observations Between (should be >= 500,000 and <= 2,000,000 observations.):
    • OHCO Structure (as delimitted column names):
    • Columns (as delimitted column names, including token_str, term_str, pos, and pos_group):
    xxxxxxxxxx
    ## VOCAB (2)

    VOCAB (2)¶

    The unique word types (terms) in the corpus.

    • UVA Box URL:
    • GitHub URL for notebook used to create:
    • Delimitter:
    • Number of observations:
    • Columns (as delimitted names, including n, p', i, dfidf, porter_stem, max_pos and max_pos_group, stop):
    • Note: Your VOCAB may contain ngrams. If so, add a feature for ngram_length.
    • List the top 20 significant words in the corpus by DFIDF.

    (INSERT LIST HERE)

    xxxxxxxxxx
    # Derived Tables

    Derived Tables¶

    xxxxxxxxxx
    ## BOW (3)

    BOW (3)¶

    A bag-of-words representation of the CORPUS.

    • UVA Box URL:
    • GitHub URL for notebook used to create:
    • Delimitter:
    • Bag (expressed in terms of OHCO levels):
    • Number of observations:
    • Columns (as delimitted names, including n, tfidf):
    xxxxxxxxxx
    ## DTM (3)

    DTM (3)¶

    A represenation of the BOW as a sparse count matrix.

    • UVA Box URL:
    • UVA Box URL of BOW used to generate (if applicable):
    • GitHub URL for notebook used to create:
    • Delimitter:
    • Bag (expressed in terms of OHCO levels):
    xxxxxxxxxx
    ## TFIDF (3)

    TFIDF (3)¶

    A Document-Term matrix with TFIDF values.

    • UVA Box URL:
    • UVA Box URL of DTM or BOW used to create:
    • GitHub URL for notebook used to create:
    • Delimitter:
    • Description of TFIDIF formula (𝐿𝐴𝑇𝐸𝑋LATEX OK):
    xxxxxxxxxx
    ## Reduced and Normalized TFIDF_L2 (3)

    Reduced and Normalized TFIDF_L2 (3)¶

    xxxxxxxxxx
    A Document-Term matrix with L2 normalized TFIDF values.

    A Document-Term matrix with L2 normalized TFIDF values.

    • UVA Box URL:
    • UVA Box URL of source TFIDF table:
    • GitHub URL for notebook used to create:
    • Delimitter:
    • Number of features (i.e. significant words):
    • Principle of significant word selection:
    xxxxxxxxxx
    # Models

    Models¶

    xxxxxxxxxx
    ## PCA Components (4)

    PCA Components (4)¶

    • UVA Box URL:
    • UVA Box URL of the source TFIDF_L2 table:
    • GitHub URL for notebook used to create:
    • Delimitter:
    • Number of components: 10
    • Library used to generate:
    • Top 5 positive terms for first component: continued slots noó shown objection draw
    • Top 5 negative terms for second component: is more s was are do have be
    xxxxxxxxxx
    ## PCA DCM (4)

    PCA DCM (4)¶

    The document-component matrix generated.

    • UVA Box URL:
    • GitHub URL for notebook used to create:
    • Delimitter:
    xxxxxxxxxx
    ## PCA Loadings (4)

    PCA Loadings (4)¶

    The component-term matrix generated.

    • UVA Box URL:
    • GitHub URL for notebook used to create:
    • Delimitter:
    xxxxxxxxxx
     
    ## PCA Visualization 1 (4)
    ​
    Include a scatterplot of documents in the space created by the first two components.
    ​
    Color the points based on a metadata feature associated with the documents.
    ​
    Also include a scatterplot of the loadings for the same two components. (This does not need a feature mapped onto color.)
    #### PCA First
    ![erroruse.png](attachment:bdaf3d4a-f6fe-47aa-adb9-ca634e0bc967.png)
    ​
    ##### PCA Second 
    ![image.png](attachment:13c8f9a2-6cd6-4e8e-b101-6c10375652ad.png)
    Briefly describe the nature of the polarity you see in the first component:
    ​
    Again, we struggle to see the intricacies without plotly, however polarity is on full display in the graph, with 2 distict high and low clsuters for PC1, and a notable absence of PC0 across most films. The only exception here, is Steve Jobs, which appears to be uniquely inconsistent where most other films show a distinctive pattern. This is perhaps associated with the fact that "Steve Jobs" is the only documentary film on our list, and therefore exhibits patterns inconsistent with our scripted film counterparts. A documentary does not follow the same dialogue and narrative flow and a typical film.

    PCA Visualization 1 (4)¶

    Include a scatterplot of documents in the space created by the first two components.

    Color the points based on a metadata feature associated with the documents.

    Also include a scatterplot of the loadings for the same two components. (This does not need a feature mapped onto color.)

    PCA First¶

    erroruse.png

    PCA Second¶

    image.png Briefly describe the nature of the polarity you see in the first component:

    Again, we struggle to see the intricacies without plotly, however polarity is on full display in the graph, with 2 distict high and low clsuters for PC1, and a notable absence of PC0 across most films. The only exception here, is Steve Jobs, which appears to be uniquely inconsistent where most other films show a distinctive pattern. This is perhaps associated with the fact that "Steve Jobs" is the only documentary film on our list, and therefore exhibits patterns inconsistent with our scripted film counterparts. A documentary does not follow the same dialogue and narrative flow and a typical film.

    xxxxxxxxxx
     
    ## PCA Visualization 2 (4)
    ​
    Include a scatterplot of documents in the space created by the second two components.
    ​
    Color the points based on a metadata feature associated with the documents.
    ​
    Also include a scatterplot of the loadings for the same two components. (This does not need a feature mapped onto color.)
    ​
    ![image.png](attachment:206217ff-39e2-40a1-bde9-4bf6a4a09829.png)
    ​
    ![erroruse.png](attachment:82ff5592-a949-4f90-a4ed-2314c852a33f.png)
    ​
    Briefly describe the nature of the polarity you see in the second component:
    ​
    In this PCS visual we see a very different pattern than previous.Once again, likely for the same reasons as previously, Steve Jobs shows the least consistency across each PC. Here, to a far greater degree than before, indicating that these PCs likely do not ecompas any aspect of the film well. We also see an interesting Diagonal clustering towards the positive of PC 2. It seems The Social Network rests closer to the neutral point, and The Big Short is far positive for each. The diagonal gradient for the films in between those endpoints is interesting to see. This shows that which brings each entrepreneural film together, and that which teases them apart.
    xxxxxxxxxx
    ## LDA TOPIC (4)

    LDA TOPIC (4)¶

    • UVA Box URL:
    • UVA Box URL of count matrix used to create:
    • GitHub URL for notebook used to create:
    • Delimitter:
    • Libary used to compute: sklearn
    • A description of any filtering, e.g. POS (Nouns and Verbs only): Nouns & Adjectives
    • Number of components: 39
    • Any other parameters used:
    • Top 5 words and best-guess labels for topic five topics by mean document weight:
      • T00: continued lot arches angles parking
      • T01: bonds mortgage banks car mortgage
      • T02: computer room moment door skeeter
      • T03: yule skeeter os door map
      • T04: girls kitchen door room skeeter
    xxxxxxxxxx
    ## LDA THETA (4)

    LDA THETA (4)¶

    • UVA Box URL:
    • GitHub URL for notebook used to create:
    • Delimitter:
    xxxxxxxxxx
     
    ## LDA PHI (4)
    ​
    - UVA Box URL:
    - GitHub URL for notebook used to create:
    - Delimitter:

    LDA PHI (4)¶

    • UVA Box URL:
    • GitHub URL for notebook used to create:
    • Delimitter:
    xxxxxxxxxx
     
    ## LDA + PCA Visualization (4)
    ​
    Apply PCA to the PHI table and plot the topics in the space opened by the first two components.
    ​
    Size the points based on the mean document weight of each topic (using the THETA table).
    ​
    Color the points basd on a metadata feature from the LIB table.
    ​
    Provide a brief interpretation of what you see.
    ​
    ##### PHI with Theta scatter Size
    Because my Plotly was not rendering properly throughout the semester, I am not able to see the granular details of what is going on in the lower left corner of the graph. Here is where we would likely see most of the components intricacies lay. Even so, what this does tell us is that components 1 & 2 contian very subtle differences for the most part, while there is an interesting and glaring outlier polarity in loadings T27 & T19. I think this may be due to these loadings outputting fairly film specific categroies. For example, T27's vocabulary appears to largely pull from The Founder and generally has a commerical fast food bend: continued lot arches angles parking lot parking cover restaurant things.
    ​
    ![image.png](attachment:e99c52af-e80f-4709-8f60-1ee1ce4e6c3f.png)
    ​
    ​

    LDA + PCA Visualization (4)¶

    Apply PCA to the PHI table and plot the topics in the space opened by the first two components.

    Size the points based on the mean document weight of each topic (using the THETA table).

    Color the points basd on a metadata feature from the LIB table.

    Provide a brief interpretation of what you see.

    PHI with Theta scatter Size¶

    Because my Plotly was not rendering properly throughout the semester, I am not able to see the granular details of what is going on in the lower left corner of the graph. Here is where we would likely see most of the components intricacies lay. Even so, what this does tell us is that components 1 & 2 contian very subtle differences for the most part, while there is an interesting and glaring outlier polarity in loadings T27 & T19. I think this may be due to these loadings outputting fairly film specific categroies. For example, T27's vocabulary appears to largely pull from The Founder and generally has a commerical fast food bend: continued lot arches angles parking lot parking cover restaurant things.

    image.png

    xxxxxxxxxx

    Sentiment VOCAB_SENT (4)¶

    Sentiment values associated with a subset of the VOCAB from a curated sentiment lexicon.

    • UVA Box URL:
    • UVA Box URL for source lexicon: https://www.dropbox.com/scl/fo/kdrbta82xj975r7eaipni/AHjys-3CkEhLKVllLBSAZRs/lexicons?dl=0&preview=salex_nrc.csv&rlkey=ucswokipct8i2g0fxemosbx81&subfolder_nav_tracking=1
    • GitHub URL for notebook used to create:
    • Delimitter:
    xxxxxxxxxx
    ## Sentiment BOW_SENT (4)

    Sentiment BOW_SENT (4)¶

    Sentiment values from VOCAB_SENT mapped onto BOW.

    • UVA Box URL:
    • GitHub URL for notebook used to create:
    • Delimitter:
    xxxxxxxxxx
    ## Sentiment DOC_SENT (4)

    Sentiment DOC_SENT (4)¶

    Computed sentiment per bag computed from BOW_SENT.

    • UVA Box URL:
    • GitHub URL for notebook used to create:
    • Delimitter:
    • Document bag expressed in terms of OHCO levels: Paragraphs
    xxxxxxxxxx
    ## Sentiment Plot (4)

    Sentiment Plot (4)¶

    Plot sentiment over some metric space, such as time.

    If you don't have a metric metadata features, plot sentiment over a feature of your choice.

    You may use a bar chart or a line graph.

    scene_id as a proxy for narrative time (in terms of desired layout, not chronology, because the film is not in order of events)¶

    image.pngimage.png

    xxxxxxxxxx
    ## VOCAB_W2V (4)

    VOCAB_W2V (4)¶

    A table of word2vec features associated with terms in the VOCAB table.

    • UVA Box URL:
    • GitHub URL for notebook used to create:
    • Delimitter: ,
    • Document bag expressed in terms of OHCO levels: Paragraphs
    • Number of features generated:
    • The library used to generate the embeddings:Gensim
    xxxxxxxxxx
    ## Word2vec tSNE Plot (4)

    Word2vec tSNE Plot (4)¶

    Plot word embedding featues in two-dimensions using t-SNE.

    had to use seaborn and matplot because my plotly has been consistently ouputting no data all semester, even though the data frames I'm feeding it are correct¶

    image.png

    There is a small cluster in the (-10, 3) area of the graph that has an interesting theme of novelty & youth. Words such as "new, young, little, against & few" are indicative of the association between recounting stories of entrepreneurial innovation and depictions of innovation that rubs against the grain of society. Words like few and against make me think of the notion that traditional self-made American stories are framed as "breaking the mold" and thinking in manners that people may not agree with at first. For example, early in The Social Network, nay sayers argue that there are other platforms that already have more appeal that FaceBook every would, Steve Jobs is remembered for struggling early in his career for people to "see his vision". This phenomena is additinoally iften associated with youth; a fresh mind is a flexible mind. For example, in The Help, Emma Stone's character Skeeter is a new graduate when she sets out to write a revolutionary account of Black maids experience with their white bosses in a 1960s Jackson, Mississippi

    xxxxxxxxxx
    # Riffs

    Riffs¶

    Provde at least three visualizations that combine the preceding model data in interesting ways.

    These should provide insight into how features in the LIB table are related.

    The nature of this relationship is left open to you -- it may be correlation, or mutual information, or something less well defined.

    In doing so, consider the following visualization types:

    • Hierarchical cluster diagrams
    • Heatmaps
    • Scatter plots
    • KDE plots
    • Dispersion plots
    • t-SNE plots
    • etc.
    xxxxxxxxxx
    ## Riff 2 (5)

    Riff 2 (5)¶

    The Social Network ~ Trust & Anger¶

    trust-tsn.png

    anger-tsn.png

    I chose to look at patterns of trust and anger in The Soical Network because commodification of entrepreneur's in media values the dramatic fluxuations in trust and stability as an idea begins to gain traction, or when it is being doubted. In The Social Network, this pressure results in the deterioration of freindships. Interestingly, along the lines of the importance of relationships in The Social Network, trust and anger share a distinct trade off from the beginning to the end of the film, likely having to do with the story line of Mark Zuckerberg and Eduardo Saverin's close friendship in college & partnership in the brand, to the moment of depicted betrayal as Eduardo is phased out by mark and his business conultants

    xxxxxxxxxx
    ## Riff 1 (5)

    Riff 1 (5)¶

    Whole Dataset Hierarchical Cluster Based On Word To Vec ~ Screenplay title as Metadata¶

    Hierarchical Cluster-Wholeset.png

    There appears to be a lot of narrow, long distances towards the top of the hierarchical cluster. While the LIB metadata was not directly used in this graph, its outputs tell us something about the nature of the data form document to document. I say this because throughout the clusters, there are term strings comprised mostly of character names. Here, and at other points through my analysis, this is particularly visible with The Help. It appears that by addressing an individual by character name is particularly prevalnet for this film. One theory as to why this may be is because. The Help takes place in the South in the 1960s, and is largely centered around women who are stay at home mothers. In such an environement and time period, where decorum is valued and and social networks are tight knit, characters may be more inclined to call each other formally by name. The Help has two distictive social groups throughout the plot: the white housewives and their black maids. To one another, the housewives value formality, so they often refer to eachother by first name, or Miss perhaps. The Maids are employees of the housewives, and therefore almost never refer to the bosses informally. Alternatively, their bosses almost exclusively refer to their maids informally, using first names or nicknames. There are also quite a few nicknames in the cluster, which I believe was on the choice of the writers to enhance the feeling that this circle of friends was well-established, and had ties to one another all the way from childhood (for example, Emma Stone's character going by Skeeter).

    xxxxxxxxxx
    ## Riff 3 (5)

    Riff 3 (5)¶

    Joy¶

    anger.png

    joy-sadness.png

    trust.png

    Joy-Sadness Trade-Off for The Social Network¶

    joy-sad-tsn.png

    I chose to compare the senitments of The Social Network and Joy because they are both stories about an entrepreneur's journey to success. While the character Joy in the film Joy was more of a genuine independent, I thought it would be interesting to compare this story with Jesse Eisenberg's portrayal of Mark Zuckerberg because in the film Mark falls in and out of many relationships, but maintains an internal, stubborn independence. Joy and sandess remain far more consistenet in The Social Network, whereas in the film Joy, there is a large spike in at the end of the first third of the movie. Joy in this film alterniatively maintains a rather consistent sentiment trend, whereas The Social Network has a far more tempramental curve. This, again could have to do with the series of relationship turmoils. In both cases, we see a general downward trend in sadness over the course of the film. In the film Joy, there is an inverses relationship between sadness and joy towards the films end, whereas in The Social Netwrok, they grow together. This could have to do with the differences in how each film elapsed. Both films involve a legal case that is resolved at the films climax, but in Joy there is a more positive connotation with ther ability to patten her mop invention, because it was a long standing struggle which she overcome. Hoever, in The Social Network, there is an air of accomplishment that overlays a larger gloom because the lawsuit settled out of court, money was owed to those who filed against FaceBook and friendships fell apart, but Mark Zuckerberg still maintianed an elevated lifestyle and was generally considered the prodigy brainchild of the company.

    xxxxxxxxxx
    # Interpretation (4)

    Interpretation (4)¶

    Describe something interesting about your corpus that you discovered during the process of completing this assignment.

    At a minumum, use 250 words, but you may use more. You may also add images if you'd like.

    (INSERT INTERPRETATION HERE)

    [ ]:
     
    ​
    Kernel status: Idle Executed 1 cellElapsed time: 19 seconds
    [1]:
     
    import pandas as pd
    import numpy as np
    from scipy.linalg import norm
    [2]:
     
    import plotly_express as px
    import seaborn as sns
    [3]:
     
    sns.set(style='ticks')
    [4]:
     
    import configparser
    config = configparser.ConfigParser()
    config.read("env.ini")
    data_home = config['DEFAULT']['data_home']
    output_dir = config['DEFAULT']['output_dir']
    local_lib = config['DEFAULT']['local_lib']
    [5]:
     
    data_prefix = 'entrepreneur'
    [6]:
     
    OHCO = ['screenplay_id', 'scene_id']
    colors = "YlGnBu"
    [7]:
     
    LIB = pd.read_csv(f'{output_dir}/ entrepreneur-LIB.csv').set_index('screenplay_id')
    VOCAB = pd.read_csv(f'{output_dir}/{data_prefix}-VOCAB-PARAS.csv').set_index('term_str')
    BOW = pd.read_csv(f'{output_dir}/{data_prefix}-BOW-PARAS.csv').set_index(OHCO+['term_str'])
    [8]:
     
    VOCAB[['n','p','i']].head(20)
    [8]:
    n p i
    term_str
    the 6340 0.042129 4.569051
    a 4009 0.026639 5.230291
    to 3642 0.024201 5.368802
    and 2999 0.019928 5.649052
    you 2737 0.018187 5.780938
    i 2271 0.015091 6.050206
    of 2159 0.014346 6.123170
    in 1841 0.012233 6.353044
    it 1585 0.010532 6.569051
    kroc 1501 0.009974 6.647609
    on 1399 0.009296 6.749137
    is 1398 0.009290 6.750169
    steve 1346 0.008944 6.804855
    that 1074 0.007137 7.130539
    mark 962 0.006392 7.289425
    with 903 0.006000 7.380736
    ray 889 0.005907 7.403278
    we 888 0.005901 7.404902
    at 876 0.005821 7.424531
    he 850 0.005648 7.467999
    [9]:
     
    BOW
    [9]:
    para_num n tf tfidf
    screenplay_id scene_id term_str
    joy 1 a 0 1 0.090909 0.134575
    drive 0 1 0.090909 0.603804
    in 0 1 0.090909 0.177906
    its 0 1 0.090909 0.302422
    kitchen 0 1 0.090909 0.525700
    ... ... ... ... ... ... ...
    the_social_network 575 waits 1 2 0.040000 0.330430
    we 1 1 0.020000 0.058725
    world 1 1 0.020000 0.125735
    youngest 1 1 0.020000 0.241362
    zuckerberg 1 1 0.020000 0.144203

    109759 rows × 4 columns

    [10]:
     
    BOW_reduced = BOW.groupby(['screenplay_id', 'scene_id', 'term_str'])['tfidf'].sum().unstack(fill_value=0)
    ​
    TFIDF = BOW_reduced
    pos_set = ['NN', 'VB']
    VOCAB['dfidf'] = VOCAB['df'] * VOCAB['idf']
    VSHORT = VOCAB[VOCAB.max_pos_group.isin(['NN', 'VB', 'JJ']) & ~VOCAB.max_pos.isin(['NNP'])].sort_values('dfidf', ascending=False).head(5000)
    TFIDF = TFIDF[VSHORT.index]
    [11]:
     
    VOCAB
    [11]:
    term_rank n n_chars p i max_pos max_pos_group n_pos_group cat_pos_group n_pos cat_pos stop term_rank2 zipf_k zipf_k2 log_r df idf dfidf
    term_str
    the 1 6340 3 0.042129 4.569051 DT DT DT {'DT', 'JJ', 'NN', 'VB'} 5 {'DT', 'NN', 'VBP', 'NNP', 'JJ'} 1 1 6340 6340 0.000000 1936 1.149243 2224.934910
    a 2 4009 1 0.026639 5.230291 DT DT DT {'DT', 'JJ', 'NN'} 5 {'DT', 'NN', 'NNS', 'NNP', 'JJ'} 1 2 8018 8018 1.000000 1539 1.480329 2278.226269
    to 3 3642 2 0.024201 5.368802 TO TO TO {'NN', 'JJ', 'TO', 'RP', 'IN', 'VB'} 12 {'NN', 'NNS', 'VBZ', 'VBP', 'NNP', 'JJ', 'TO',... 1 3 10926 10926 1.584963 1608 1.417055 2278.624094
    and 4 2999 3 0.019928 5.649052 CC CC CC {'NN', 'RB', 'CC', 'IN', 'VB'} 6 {'NN', 'VBP', 'NNP', 'RB', 'CC', 'IN'} 1 4 11996 11996 2.000000 1459 1.557342 2272.162427
    you 5 2737 3 0.018187 5.780938 PRP PR PR {'NN', 'RB', 'JJ', 'PD', 'IN', 'CD', 'VB', 'PR'} 15 {'JJR', 'NN', 'NNS', 'VBZ', 'VBP', 'NNP', 'RB'... 1 5 13685 13685 2.321928 1109 1.953063 2165.946674
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    jd 10396 1 2 0.000007 17.199318 NNP NN NN {'NN'} 1 {'NNP'} 0 259 10396 259 13.343741 1 12.068106 12.068106
    jaredó 10397 1 6 0.000007 17.199318 NNP NN NN {'NN'} 1 {'NNP'} 0 259 10397 259 13.343880 1 12.068106 12.068106
    jar 10398 1 3 0.000007 17.199318 NN NN NN {'NN'} 1 {'NN'} 0 259 10398 259 13.344018 1 12.068106 12.068106
    jammed 10399 1 6 0.000007 17.199318 JJ JJ JJ {'JJ'} 1 {'JJ'} 0 259 10399 259 13.344157 1 12.068106 12.068106
    flwhy 10400 1 4 0.000007 17.199318 NNP NN NN {'NN'} 1 {'NNP'} 0 259 10400 259 13.344296 1 12.068106 12.068106

    10400 rows × 19 columns

    xxxxxxxxxx

    Adding Some Labels

    [12]:
     
    genre_csv = """
    joy, comdey/drama, 2015
    the_founder, gothic, 2016
    the_social_network, drama/historical_fiction, 2009
    steve_jobs, drama/history, 2015
    the_help, drama/historical_fiction, 2011
    the_big_short, comedy/thriller, 2015
    """.split('\n')[1:-1]
    genre = pd.DataFrame([line.split(', ') for line in genre_csv], columns=['screenplay_id','genre', 'year'])
    genre.book_id = genre.screenplay_id
    genre = genre.set_index('screenplay_id')
    /tmp/ipykernel_570211/1656454425.py:10: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
      genre.book_id = genre.screenplay_id
    
    [13]:
     
    LIB = pd.concat([LIB, genre], axis=1)
    [17]:
    xxxxxxxxxx
    ​
    ​
    LIB['title'] = LIB['raw_title']
    #LIB = LIB.drop(['raw_title'])
    ​
    LIB_COLS = ['title', 'genre', 'year']
    #LIB=LIB.drop(['source_file_path', 'scene_regex'])
    LIB[LIB_COLS].head()
    LIB
    [17]:
    source_file_path raw_title scene_regex genre year title
    screenplay_id
    joy /sfs/weka/scratch/gec2tp/data/entrepreneur/Joy... Joy ^(INT\.|EXT\.|INT/EXT\.|EXT/INT\.|SCENE\s+\d{1... comdey/drama 2015 Joy
    steve_jobs /sfs/weka/scratch/gec2tp/data/entrepreneur/Ste... Steve Jobs ^(INT\.|EXT\.|INT/EXT\.|EXT/INT\.|SCENE\s+\d{1... drama/history 2015 Steve Jobs
    the_big_short /sfs/weka/scratch/gec2tp/data/entrepreneur/The... The Big Short ^(INT\.|EXT\.|INT/EXT\.|EXT/INT\.|SCENE\s+\d{1... comedy/thriller 2015 The Big Short
    the_founder /sfs/weka/scratch/gec2tp/data/entrepreneur/The... The Founder ^(INT\.|EXT\.|INT/EXT\.|EXT/INT\.|SCENE\s+\d{1... gothic 2016 The Founder
    the_help /sfs/weka/scratch/gec2tp/data/entrepreneur/The... The Help ^(INT\.|EXT\.|INT/EXT\.|EXT/INT\.|SCENE\s+\d{1... drama/historical_fiction 2011 The Help
    the_social_network /sfs/weka/scratch/gec2tp/data/entrepreneur/The... The Social Network ^(INT\.|EXT\.|INT/EXT\.|EXT/INT\.|SCENE\s+\d{1... drama/historical_fiction 2009 The Social Network
    xxxxxxxxxx

    PCA¶

    [18]:
     
    TFIDF_L2 = (TFIDF.T / norm(TFIDF, 2, axis=1)).T
    [19]:
     
    TFIDF_L2
    [19]:
    term_str is are have be do know was get dont were ... amble angers amfl andthen andmr andi anchoring ample appreciates appreciative
    screenplay_id scene_id
    joy 1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    2 0.000000 0.067258 0.035151 0.071038 0.107690 0.076506 0.000000 0.000000 0.040008 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    4 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    6 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    7 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    the_social_network 570 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
    572 0.126864 0.057698 0.000000 0.000000 0.000000 0.065632 0.132038 0.068215 0.000000 0.072352 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    573 0.000000 0.000000 0.056207 0.000000 0.000000 0.000000 0.061529 0.063575 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    574 0.029322 0.000000 0.000000 0.084511 0.042705 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    575 0.137268 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

    2417 rows × 5000 columns

    [20]:
     
    a = len(TFIDF_L2)
    TFIDF_L2 = TFIDF_L2.dropna()
    b = len(TFIDF_L2)
    bag_loss = a - b
    bag_loss
    [20]:
    549
    [21]:
     
    TFIDF_L2
    [21]:
    term_str is are have be do know was get dont were ... amble angers amfl andthen andmr andi anchoring ample appreciates appreciative
    screenplay_id scene_id
    joy 1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    2 0.000000 0.067258 0.035151 0.071038 0.107690 0.076506 0.000000 0.000000 0.040008 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    4 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    6 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    7 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    the_social_network 569 0.114021 0.000000 0.000000 0.109544 0.110709 0.117976 0.059336 0.061310 0.061695 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    572 0.126864 0.057698 0.000000 0.000000 0.000000 0.065632 0.132038 0.068215 0.000000 0.072352 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    573 0.000000 0.000000 0.056207 0.000000 0.000000 0.000000 0.061529 0.063575 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    574 0.029322 0.000000 0.000000 0.084511 0.042705 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    575 0.137268 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

    1868 rows × 5000 columns

    [22]:
     
    COV = TFIDF_L2.cov() # This also centers the vectorsf
    COV.head()
    [22]:
    term_str is are have be do know was get dont were ... amble angers amfl andthen andmr andi anchoring ample appreciates appreciative
    term_str
    is 0.002642 0.000293 0.000216 0.000154 0.000074 0.000178 -0.000003 0.000062 0.000051 0.000086 ... -8.438301e-07 0.000004 0.000012 0.000022 0.000002 -0.000006 -6.896139e-07 -0.000006 -0.000003 0.000001
    are 0.000293 0.001852 0.000193 0.000128 0.000100 0.000089 0.000026 0.000192 0.000126 0.000019 ... -4.966656e-07 -0.000001 -0.000002 -0.000003 0.000002 -0.000003 -9.298732e-07 -0.000004 0.000004 0.000012
    have 0.000216 0.000193 0.001570 0.000210 0.000266 0.000172 0.000229 0.000086 0.000283 0.000126 ... 5.719707e-07 -0.000001 0.000003 -0.000003 0.000002 0.000013 -7.939805e-07 -0.000003 0.000004 -0.000002
    be 0.000154 0.000128 0.000210 0.001928 0.000179 0.000185 0.000178 0.000020 0.000193 0.000086 ... -4.186387e-07 0.000002 -0.000001 -0.000002 -0.000001 0.000005 -7.837887e-07 -0.000003 -0.000002 -0.000002
    do 0.000074 0.000100 0.000266 0.000179 0.002052 0.000334 0.000255 0.000271 0.000291 0.000040 ... -4.500072e-07 0.000002 -0.000002 -0.000003 0.000002 0.000005 -8.425179e-07 -0.000003 0.000004 -0.000002

    5 rows × 5000 columns

    [23]:
     
    from scipy.linalg import eigh
    eig_vals, eig_vecs = eigh(COV)
    [24]:
     
    EIG_VEC = pd.DataFrame(eig_vecs, index=COV.index, columns=COV.index)
    EIG_VAL = pd.DataFrame(eig_vals, index=COV.index, columns=['eig_val'])
    EIG_VAL.index.name = 'term_str'
    [25]:
     
    EIG_VEC.iloc[:10, :10].style.background_gradient(cmap=colors)
    [25]:
    term_str is are have be do know was get dont were
    term_str                    
    is -0.008740 0.014468 -0.010531 0.023358 -0.019501 0.047186 -0.058522 0.059552 0.051914 0.059551
    are 0.001514 0.018318 -0.013984 0.051993 -0.038404 0.103738 -0.106298 0.064809 0.054640 0.005299
    have 0.000613 0.001905 0.000246 -0.015984 0.023679 -0.040493 0.056038 -0.027658 0.007505 0.045481
    be -0.001078 -0.000148 0.001307 0.001813 -0.000114 0.001185 -0.001046 -0.000276 0.002085 -0.001345
    do -0.006433 -0.004145 -0.007857 0.015346 0.004382 0.023060 -0.011600 0.009771 0.012486 -0.008574
    know -0.003377 0.006185 0.003028 0.018620 -0.018843 0.027605 -0.019937 0.006014 0.009631 -0.022318
    was -0.005365 0.017261 -0.001275 0.018683 -0.002109 0.016016 -0.032710 0.016248 0.029140 0.023536
    get 0.008093 -0.013423 -0.006766 -0.016630 0.005738 -0.015421 0.001010 0.030972 0.028472 0.008850
    dont -0.009942 0.010504 -0.012400 -0.005770 0.013332 0.009641 -0.001330 0.008068 -0.006390 0.000242
    were 0.002320 0.006897 -0.006384 0.005637 0.001236 -0.002112 -0.012179 -0.001382 -0.008316 -0.007144
    [26]:
     
    EIG_VEC_PAIRS = EIG_VEC.stack().sort_values(ascending=False).to_frame('covariance')
    EIG_VEC_PAIRS.index.names = ['term1', 'term2']
    EIG_VEC_PAIRS.head(20)
    [26]:
    covariance
    term1 term2
    more appreciates 0.993390
    continued appreciative 0.991010
    tennis genuine 0.608465
    os andi 0.559930
    phone andi 0.475615
    office andthen 0.452633
    bomb geek 0.434562
    s anchoring 0.424377
    title antagonize 0.418314
    fries teammates 0.403636
    hotter girlish 0.402058
    loan flattery 0.391117
    is andmr 0.381882
    ticket apples 0.372375
    seats fixes 0.362034
    truck have 0.350836
    statue irritated 0.348182
    office amfl 0.345516
    right amble 0.337294
    zero swear 0.337101
    [27]:
     
    EIG_VEC_PAIRS.sample(10000).sort_values('covariance', ascending=False).plot(rot=45, style='.', figsize=(10,5));
    xxxxxxxxxx

    Select Principal Components¶

    [28]:
     
    EIG_PAIRS = EIG_VAL.join(EIG_VEC.T)
    EIG_PAIRS.sort_values('eig_val', ascending=False).head(10)
    [28]:
    eig_val is are have be do know was get dont ... amble angers amfl andthen andmr andi anchoring ample appreciates appreciative
    term_str
    appreciative 0.083012 -0.036719 -0.021617 -0.018914 -0.018489 -0.019728 -0.015675 -0.022285 -0.016027 -0.015902 ... -0.000027 -0.000104 -0.000109 -0.000170 -0.000088 -0.000215 -0.000053 -0.000196 -0.000149 -0.000148
    appreciates 0.018867 -0.032156 -0.012144 -0.003615 0.000054 -0.010020 -0.002721 -0.010583 -0.007981 -0.004658 ... -0.000043 0.000114 -0.000175 -0.000299 -0.000123 -0.000317 -0.000090 -0.000337 -0.000207 -0.000253
    ample 0.008506 -0.081056 -0.066555 -0.120606 -0.101684 -0.153571 -0.154922 -0.199971 -0.067697 -0.039216 ... 0.000099 -0.002738 -0.000473 0.000843 -0.000113 -0.000673 0.000212 0.001564 -0.003335 -0.000869
    anchoring 0.005048 -0.098747 -0.089643 -0.140463 -0.150119 -0.193075 -0.157450 -0.276807 -0.093192 -0.255633 ... -0.000189 0.001625 -0.000636 0.002254 -0.001338 -0.002328 0.000062 0.002040 0.000097 -0.002933
    andi 0.004505 0.315112 0.133692 0.031232 0.015433 0.022448 -0.011864 -0.152727 -0.018499 -0.023347 ... -0.000175 0.000442 0.000435 0.001667 0.000240 -0.002409 0.000673 -0.001708 0.000011 -0.000950
    andmr 0.003949 0.381882 0.140767 0.041358 0.039815 -0.082378 0.025372 -0.257427 0.016483 -0.016768 ... 0.000014 0.000324 0.001384 0.004030 0.000604 -0.001966 0.000379 0.000199 0.001144 0.001868
    andthen 0.003755 -0.166583 -0.187275 -0.073125 -0.076670 -0.014912 0.014580 0.284799 -0.041987 -0.093919 ... -0.000050 0.000170 -0.000922 -0.000850 0.000279 -0.003045 0.000749 -0.002401 -0.000189 0.002314
    amfl 0.003695 0.034432 -0.015104 0.034329 -0.011472 0.082615 0.083550 -0.028582 0.037542 0.092059 ... -0.000008 -0.000310 0.002278 0.002880 -0.000675 0.005466 0.000391 0.001184 0.002373 -0.003623
    angers 0.003542 0.094634 -0.065145 -0.030695 0.019852 -0.066882 -0.031029 0.084035 -0.060636 0.041990 ... 0.000219 0.000049 0.000115 0.003336 0.000495 0.000302 -0.000106 0.001895 0.000641 0.000978
    amble 0.003458 -0.019989 0.062340 0.028161 0.030435 0.030439 0.040775 -0.018104 -0.004505 0.026517 ... -0.000106 0.000272 0.001943 -0.001266 0.000312 0.002206 -0.000888 -0.001092 -0.001645 0.003131

    10 rows × 5001 columns

    [29]:
     
    EIG_PAIRS['exp_var'] = np.round((EIG_PAIRS.eig_val / EIG_PAIRS.eig_val.sum()) * 100, 2)
    EIG_PAIRS.exp_var.sort_values(ascending=False).head().plot.bar(rot=45);
    xxxxxxxxxx

    Picking Top K Components¶

    [30]:
     
    COMPS = EIG_PAIRS.sort_values('exp_var', ascending=False).head(10).reset_index(drop=True)
    COMPS.index.name = 'comp_id'
    COMPS.index = ["PC{}".format(i) for i in COMPS.index.tolist()]
    COMPS.index.name = 'pc_id'
    COMPS
    [30]:
    eig_val is are have be do know was get dont ... angers amfl andthen andmr andi anchoring ample appreciates appreciative exp_var
    pc_id
    PC0 0.083012 -0.036719 -0.021617 -0.018914 -0.018489 -0.019728 -0.015675 -0.022285 -0.016027 -0.015902 ... -0.000104 -0.000109 -0.000170 -0.000088 -0.000215 -0.000053 -0.000196 -0.000149 -0.000148 8.48
    PC1 0.018867 -0.032156 -0.012144 -0.003615 0.000054 -0.010020 -0.002721 -0.010583 -0.007981 -0.004658 ... 0.000114 -0.000175 -0.000299 -0.000123 -0.000317 -0.000090 -0.000337 -0.000207 -0.000253 1.93
    PC2 0.008506 -0.081056 -0.066555 -0.120606 -0.101684 -0.153571 -0.154922 -0.199971 -0.067697 -0.039216 ... -0.002738 -0.000473 0.000843 -0.000113 -0.000673 0.000212 0.001564 -0.003335 -0.000869 0.87
    PC3 0.005048 -0.098747 -0.089643 -0.140463 -0.150119 -0.193075 -0.157450 -0.276807 -0.093192 -0.255633 ... 0.001625 -0.000636 0.002254 -0.001338 -0.002328 0.000062 0.002040 0.000097 -0.002933 0.52
    PC4 0.004505 0.315112 0.133692 0.031232 0.015433 0.022448 -0.011864 -0.152727 -0.018499 -0.023347 ... 0.000442 0.000435 0.001667 0.000240 -0.002409 0.000673 -0.001708 0.000011 -0.000950 0.46
    PC5 0.003949 0.381882 0.140767 0.041358 0.039815 -0.082378 0.025372 -0.257427 0.016483 -0.016768 ... 0.000324 0.001384 0.004030 0.000604 -0.001966 0.000379 0.000199 0.001144 0.001868 0.40
    PC6 0.003755 -0.166583 -0.187275 -0.073125 -0.076670 -0.014912 0.014580 0.284799 -0.041987 -0.093919 ... 0.000170 -0.000922 -0.000850 0.000279 -0.003045 0.000749 -0.002401 -0.000189 0.002314 0.38
    PC7 0.003695 0.034432 -0.015104 0.034329 -0.011472 0.082615 0.083550 -0.028582 0.037542 0.092059 ... -0.000310 0.002278 0.002880 -0.000675 0.005466 0.000391 0.001184 0.002373 -0.003623 0.38
    PC8 0.003542 0.094634 -0.065145 -0.030695 0.019852 -0.066882 -0.031029 0.084035 -0.060636 0.041990 ... 0.000049 0.000115 0.003336 0.000495 0.000302 -0.000106 0.001895 0.000641 0.000978 0.36
    PC9 0.003458 -0.019989 0.062340 0.028161 0.030435 0.030439 0.040775 -0.018104 -0.004505 0.026517 ... 0.000272 0.001943 -0.001266 0.000312 0.002206 -0.000888 -0.001092 -0.001645 0.003131 0.35

    10 rows × 5002 columns

    x
    # Loadings
    [31]:
     
    LOADINGS = COMPS[COV.index].T
    LOADINGS.index.name = 'term_str'
    LOADINGS.head(10).style.background_gradient(cmap=colors)
    [31]:
    pc_id PC0 PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
    term_str                    
    is -0.036719 -0.032156 -0.081056 -0.098747 0.315112 0.381882 -0.166583 0.034432 0.094634 -0.019989
    are -0.021617 -0.012144 -0.066555 -0.089643 0.133692 0.140767 -0.187275 -0.015104 -0.065145 0.062340
    have -0.018914 -0.003615 -0.120606 -0.140463 0.031232 0.041358 -0.073125 0.034329 -0.030695 0.028161
    be -0.018489 0.000054 -0.101684 -0.150119 0.015433 0.039815 -0.076670 -0.011472 0.019852 0.030435
    do -0.019728 -0.010020 -0.153571 -0.193075 0.022448 -0.082378 -0.014912 0.082615 -0.066882 0.030439
    know -0.015675 -0.002721 -0.154922 -0.157450 -0.011864 0.025372 0.014580 0.083550 -0.031029 0.040775
    was -0.022285 -0.010583 -0.199971 -0.276807 -0.152727 -0.257427 0.284799 -0.028582 0.084035 -0.018104
    get -0.016027 -0.007981 -0.067697 -0.093192 -0.018499 0.016483 -0.041987 0.037542 -0.060636 -0.004505
    dont -0.015902 -0.004658 -0.039216 -0.255633 -0.023347 -0.016768 -0.093919 0.092059 0.041990 0.026517
    were -0.014546 -0.010964 -0.045979 -0.177439 -0.054759 0.011618 -0.022318 -0.045951 -0.029490 0.044143
    [32]:
     
    top_terms = []
    for i in range(10):
        for j in [0, 1]:
            comp_str = ' '.join(LOADINGS.sort_values(f'PC{i}', ascending=bool(j)).head(10).index.to_list())
            top_terms.append((f"PC{i}", j, comp_str))
    COMP_GLOSS = pd.DataFrame(top_terms).set_index([0,1]).unstack()
    COMP_GLOSS.index.name = 'comp_id'
    COMP_GLOSS.columns = COMP_GLOSS.columns.droplevel(0) 
    COMP_GLOSS = COMP_GLOSS.rename(columns={0:'pos', 1:'neg'})
    COMP_GLOSS
    [32]:
    1 pos neg
    comp_id
    PC0 continued slots noó shown objection draw misap... is more s was are do have be t get
    PC1 more continued t lucky years twenty liked s co... is os phone are office room front were was car
    PC2 car phone office os vo front kitchen home driv... s t don was re m know do have gonna
    PC3 s t car house door re don sits waiting front was dont do were know be have beat did had
    PC4 os phone is are cheerleaders rings map s stand... was car vo did were food nods pulls said day
    PC5 is office computer are door vo day students ri... os was did cheerleaders t do car watch drive k...
    PC6 office phone was day students did had same sai... are is car looks map standing dont watch line ...
    PC7 office day time car home door front watch look... students computer right left male news bunch s...
    PC8 looks is map was beat students right door loan... vo phone car os people customers do are get voice
    PC9 right office students car watch computer os ch... phone news hear house door did title anything ...
    xxxxxxxxxx

    DCM¶

    [33]:
     
    DCM = TFIDF_L2.dot(COMPS[COV.index].T) 
    DCM = DCM.join(LIB[LIB_COLS], on='screenplay_id')
    [34]:
     
    DCM['doc'] = DCM.apply(lambda x: f"{x.title} {str(x.name[1]).zfill(2)}", 1)
    DCM.doc
    [34]:
    screenplay_id       scene_id
    joy                 1                           Joy 01
                        2                           Joy 02
                        4                           Joy 04
                        6                           Joy 06
                        7                           Joy 07
                                             ...          
    the_social_network  569         The Social Network 569
                        572         The Social Network 572
                        573         The Social Network 573
                        574         The Social Network 574
                        575         The Social Network 575
    Name: doc, Length: 1868, dtype: object
    [35]:
     
    DCM.head()
    [35]:
    PC0 PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 title genre year doc
    screenplay_id scene_id
    joy 1 -0.009742 -0.008930 0.057842 0.044163 -0.012808 -0.088922 -0.105269 0.061762 0.075649 0.080738 Joy comdey/drama 2015 Joy 01
    2 -0.026036 0.035024 -0.077992 -0.172449 0.015656 -0.044006 -0.114761 0.045198 0.023700 0.105914 Joy comdey/drama 2015 Joy 02
    4 -0.005343 -0.007115 0.026311 0.068166 -0.069754 -0.057152 -0.063562 0.058273 -0.079122 0.118392 Joy comdey/drama 2015 Joy 04
    6 -0.013547 -0.014493 0.063510 0.127740 -0.068335 -0.095264 -0.138851 0.174558 -0.058543 0.340153 Joy comdey/drama 2015 Joy 06
    7 -0.009002 -0.008628 0.033626 0.067546 -0.039601 -0.033108 -0.082383 0.081205 0.000210 0.124174 Joy comdey/drama 2015 Joy 07
    xxxxxxxxxx

    PCA Visualizations¶

    [36]:
    x
    def vis_pcs(M, a, b, label='title', hover_name='genre', symbol=None, size=None):
        M = M.reset_index()
        return px.scatter(
            M, f"PC{a}", f"PC{b}",
            color=label,
            hover_name=hover_name,
            symbol=symbol if symbol in M.columns else None,
            size=size if size in M.columns else None,
            marginal_x='box', height=800
        )
    def vis_loadings(a=0, b=1, hover_name='term_str'):
        #X = LOADINGS.join(VOCAB)
        X = LOADINGS.join(VSHORT)
        return px.scatter(X.reset_index(), f"PC{a}", f"PC{b}", 
                          text='term_str', size='i', color='max_pos_group', 
                          marginal_x='box', height=800)
    [41]:
     
    DCM
    [41]:
    PC0 PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 title genre year doc
    screenplay_id scene_id
    joy 1 -0.009742 -0.008930 0.057842 0.044163 -0.012808 -0.088922 -0.105269 0.061762 0.075649 0.080738 Joy comdey/drama 2015 Joy 01
    2 -0.026036 0.035024 -0.077992 -0.172449 0.015656 -0.044006 -0.114761 0.045198 0.023700 0.105914 Joy comdey/drama 2015 Joy 02
    4 -0.005343 -0.007115 0.026311 0.068166 -0.069754 -0.057152 -0.063562 0.058273 -0.079122 0.118392 Joy comdey/drama 2015 Joy 04
    6 -0.013547 -0.014493 0.063510 0.127740 -0.068335 -0.095264 -0.138851 0.174558 -0.058543 0.340153 Joy comdey/drama 2015 Joy 06
    7 -0.009002 -0.008628 0.033626 0.067546 -0.039601 -0.033108 -0.082383 0.081205 0.000210 0.124174 Joy comdey/drama 2015 Joy 07
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    the_social_network 569 -0.040815 -0.024609 -0.139333 -0.186990 0.060854 0.099896 -0.030740 0.066110 -0.025069 0.040497 The Social Network drama/historical_fiction 2009 The Social Network 569
    572 -0.030844 -0.016715 -0.100461 -0.139302 0.049060 0.142616 0.076705 0.102582 0.011328 0.077599 The Social Network drama/historical_fiction 2009 The Social Network 572
    573 -0.020800 -0.011092 -0.080757 -0.107443 -0.035434 -0.023990 -0.002382 0.025048 -0.016980 0.026415 The Social Network drama/historical_fiction 2009 The Social Network 573
    574 -0.020562 -0.013363 -0.064003 -0.057032 0.025329 0.071072 -0.034069 -0.057014 -0.008166 0.041210 The Social Network drama/historical_fiction 2009 The Social Network 574
    575 -0.014515 -0.011432 -0.010473 0.001128 0.028815 0.074049 -0.043474 -0.013858 -0.000349 -0.036334 The Social Network drama/historical_fiction 2009 The Social Network 575

    1868 rows × 14 columns

    [37]:
     
    DCM = DCM.dropna()
    vis_pcs(DCM, 0, 1)
    −10123456−101234
    PC0PC1
    plotly-logomark
    [40]:
     
    vis_loadings(0, 1)
    −10123456−101234
    PC0PC1
    plotly-logomark
    [ ]:
     
    TFIDF_L2.to_csv(f"{output_dir}/{data_prefix}-TFIDF_chap_L2.csv")
    DCM.iloc[:,:10].to_csv(f"{output_dir}/{data_prefix}-PCA_DCM_chap.csv")
    COMPS.iloc[:,[0,-1]].to_csv(f"{output_dir}/{data_prefix}-PCA_COMPS_chap.csv")
    LOADINGS.to_csv(f"{output_dir}/{data_prefix}-PCA_TCM_chap.csv")
    LIB.to_csv(f"{output_dir}/{data_prefix}-LIB.csv")
    [ ]:
     
    LIB.to_csv(f"{output_dir}/{data_prefix}-LIB.csv")
    [44]:
    xxxxxxxxxx
     
    import matplotlib.pyplot as plt
    import seaborn as sns
    ​
    def vis_pcs_matplotlib(M, a, b, label='title', hover_name='genre', symbol=None, size=None):
        df = M.reset_index()
    ​
        plt.figure(figsize=(16, 12))
        
        # Use seaborn scatterplot to support grouping
        sns.scatterplot(
            data=df,
            x=f"PC{a}",
            y=f"PC{b}",
            hue=label,
            style=symbol if symbol and symbol in df.columns else None,
            size=size if size and size in df.columns else None,
            sizes=(20, 200),
            alpha=0.8
        )
    ​
        # Optionally annotate with hover_name or doc if available
        if 'doc' in df.columns:
            for _, row in df.iterrows():
                plt.text(row[f"PC{a}"], row[f"PC{b}"], str(row['doc']),
                         fontsize=8, ha='center', va='bottom')
    ​
        plt.xlabel(f"PC{a}")
        plt.ylabel(f"PC{b}")
        plt.title(f"PC{a} vs PC{b} Projection")
        plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
        plt.tight_layout()
        plt.grid(True)
        plt.show()
    [45]:
     
    vis_pcs_matplotlib(DCM, 0, 1, label='title', hover_name='genre', symbol='genre', size=None)
    [48]:
     
    vis_pcs_matplotlib(DCM, 2, 3, label='title', hover_name='genre', symbol='genre', size=None)
    [50]:
    xxxxxxxxxx
     
    pip install adjustText
    Defaulting to user installation because normal site-packages is not writeable
    Collecting adjustText
      Obtaining dependency information for adjustText from https://files.pythonhosted.org/packages/53/1c/8feedd607cc14c5df9aef74fe3af9a99bf660743b842a9b5b1865326b4aa/adjustText-1.3.0-py3-none-any.whl.metadata
      Downloading adjustText-1.3.0-py3-none-any.whl.metadata (3.1 kB)
    Requirement already satisfied: numpy in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from adjustText) (1.24.4)
    Requirement already satisfied: matplotlib in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from adjustText) (3.7.2)
    Requirement already satisfied: scipy in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from adjustText) (1.11.2)
    Requirement already satisfied: contourpy>=1.0.1 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (1.1.0)
    Requirement already satisfied: cycler>=0.10 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (0.11.0)
    Requirement already satisfied: fonttools>=4.22.0 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (4.42.1)
    Requirement already satisfied: kiwisolver>=1.0.1 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (1.4.5)
    Requirement already satisfied: packaging>=20.0 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (23.1)
    Requirement already satisfied: pillow>=6.2.0 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (9.5.0)
    Requirement already satisfied: pyparsing<3.1,>=2.3.1 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (3.0.9)
    Requirement already satisfied: python-dateutil>=2.7 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (2.8.2)
    Requirement already satisfied: six>=1.5 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib->adjustText) (1.16.0)
    Downloading adjustText-1.3.0-py3-none-any.whl (13 kB)
    Installing collected packages: adjustText
    Successfully installed adjustText-1.3.0
    Note: you may need to restart the kernel to use updated packages.
    
    [51]:
     
    import matplotlib.pyplot as plt
    import seaborn as sns
    from adjustText import adjust_text
    ​
    def vis_loadings_matplotlib(LOADINGS, VSHORT, a=0, b=1, size_col='i', label_col='term_str', color_col='max_pos_group'):
        # Merge and reset index
        df = LOADINGS.join(VSHORT).reset_index()
    ​
        # Setup figure
        plt.figure(figsize=(14, 10))
        scatter = sns.scatterplot(
            data=df,
            x=f"PC{a}", y=f"PC{b}",
            size=size_col, sizes=(20, 300),
            hue=color_col,
            palette='tab10', alpha=0.7,
            legend='brief'
        )
    ​
        # Add labels
        texts = []
        for _, row in df.iterrows():
            texts.append(plt.text(
                row[f"PC{a}"], row[f"PC{b}"], row[label_col],
                fontsize=9, ha='center', va='bottom'))
    ​
        # Reduce overlap
        adjust_text(texts, arrowprops=dict(arrowstyle='-', color='gray'))
    ​
        # Labels and layout
        plt.title(f"Loadings Scatter (PC{a} vs PC{b})")
        plt.xlabel(f"PC{a}")
        plt.ylabel(f"PC{b}")
        plt.tight_layout()
        plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
        plt.show()
    [53]:
     
    vis_loadings_matplotlib(LOADINGS, VSHORT, a=2, b=3)
    4567 [-0.50789139 -0.0949744 ]
    4680 [ 0.27984135 -0.85382286]
    4897 [-0.68490901  0.7587224 ]
    4961 [0.66463274 0.06770078]
    4595 [0.3497628  0.77650478]
    4861 [ 0.92218712 -0.49231247]
    4888 [0.8829542 0.6585762]
    4655 [ 0.26758952 -0.31061838]
    4727 [ 0.39000279 -0.56890268]
    4797 [ 0.00109492 -0.25192905]
    4696 [-0.63752061 -0.11531692]
    4873 [-0.34379271 -0.85557578]
    4700 [0.97469474 0.19375244]
    4722 [ 0.79768466 -0.30189782]
    4898 [-0.37117123  0.098512  ]
    3645 [-0.63608922  0.79200478]
    4369 [ 0.60135564 -0.61232354]
    4728 [-0.29478284  0.38328454]
    4968 [-0.29217292  0.0951443 ]
    4682 [0.03684512 0.08431255]
    4775 [-0.98554693 -0.18321598]
    4619 [-0.94301574  0.46140191]
    4944 [-0.57149304  0.40069713]
    4637 [-0.26096829 -0.53517483]
    4864 [-0.5752367   0.66579395]
    4075 [-0.99149413  0.31178917]
    4086 [-0.23573339 -0.81125532]
    4882 [ 0.59989225 -0.03807904]
    4890 [0.97859965 0.85489858]
    3864 [ 0.39804033 -0.04333608]
    4210 [0.30682037 0.30935263]
    4747 [0.88937664 0.37713575]
    4766 [-0.4846322  0.7575536]
    4703 [-0.10544989  0.97683313]
    4950 [-0.13955358  0.02228346]
    4670 [0.26181109 0.13170955]
    4752 [-0.37389538  0.14618569]
    4825 [-0.19691453  0.5874795 ]
    4845 [-0.89786202  0.59269186]
    2735 [0.70608525 0.4303838 ]
    2779 [ 0.64064863 -0.81163956]
    3323 [-0.02258698 -0.71013477]
    3378 [-0.37710638  0.13123947]
    3534 [ 0.43904856 -0.83977336]
    3570 [-0.81679052 -0.6808206 ]
    3844 [-0.4560666  -0.97362067]
    4164 [-0.00391237  0.12419355]
    4256 [ 0.3866296  -0.57282761]
    4499 [0.61866025 0.74134243]
    3021 [-0.07191277  0.16835636]
    3913 [0.28667312 0.40738733]
    3945 [0.26722785 0.25857102]
    4328 [ 0.90199546 -0.39376932]
    4463 [-0.38275195 -0.00836352]
    4544 [ 0.74566373 -0.1813279 ]
    3149 [-0.26844    -0.84573989]
    3608 [-0.62763055  0.87972929]
    3187 [ 0.04026056 -0.06596498]
    3272 [-0.15985991  0.41287007]
    3366 [0.68832306 0.33778308]
    3619 [-0.32852662 -0.22003518]
    3129 [0.62790639 0.00247026]
    3209 [-0.6024152   0.52891191]
    3812 [ 0.41977505 -0.73637968]
    4370 [0.69776197 0.20174958]
    3721 [ 0.7859651  -0.79226463]
    3726 [0.40953813 0.79965829]
    3977 [ 0.90448047 -0.54458018]
    4377 [-0.50428057  0.46661517]
    4464 [-0.55352149 -0.58546816]
    3367 [-0.60840893  0.23105144]
    4130 [-0.64695302  0.24551274]
    4154 [0.3451432 0.1464167]
    4203 [0.08673552 0.65646069]
    4685 [0.66283942 0.04883871]
    4778 [0.99078141 0.81483882]
    4710 [0.93864231 0.54674046]
    4995 [-0.14757475 -0.14460523]
    3804 [0.59515359 0.68140543]
    4245 [ 0.44741491 -0.01138915]
    4736 [-0.2168871  -0.92900567]
    4879 [0.87474296 0.8732179 ]
    1964 [ 0.39385679 -0.78634279]
    2326 [0.78577897 0.11897849]
    3916 [-0.18589221 -0.19002688]
    3958 [ 0.37541357 -0.27564728]
    4431 [ 0.03123482 -0.9853634 ]
    3057 [0.6829225  0.08692274]
    3475 [-0.87650613  0.20650484]
    3978 [ 0.27621783 -0.89757697]
    4000 [-0.10405637 -0.22874771]
    4249 [ 0.26397287 -0.27382007]
    4434 [-0.14029864 -0.11123084]
    4674 [ 0.42192629 -0.85248755]
    4877 [-0.05213851  0.97302352]
    4650 [0.76858214 0.64985376]
    4977 [-0.31590913 -0.09262581]
    3266 [0.87927594 0.17378943]
    4162 [ 0.14559943 -0.21933383]
    3535 [ 0.88904723 -0.93771457]
    3698 [-0.09253234  0.83753478]
    3756 [ 0.41756882 -0.51025105]
    3818 [ 0.49306199 -0.91820772]
    4375 [ 0.65973545 -0.88585915]
    4509 [ 0.48694775 -0.52715202]
    3486 [0.69723097 0.27493277]
    3587 [ 0.15457958 -0.74290148]
    3194 [0.4226593  0.72173901]
    3451 [ 0.64327421 -0.66288328]
    3979 [-0.49639661 -0.11742076]
    4874 [-0.81260931 -0.82872606]
    4875 [ 0.31601556 -0.43109877]
    4759 [ 0.5922964 -0.0205816]
    4772 [-0.86206539  0.54680175]
    3100 [0.39996781 0.29389123]
    3331 [-0.40953243 -0.93313983]
    3358 [0.03184316 0.29462138]
    4174 [ 0.18924491 -0.71714059]
    3496 [ 0.53231496 -0.62074789]
    4173 [0.91234353 0.66191411]
    4380 [ 0.41956868 -0.38251641]
    3377 [0.78401305 0.93637112]
    3521 [ 0.58079679 -0.80795023]
    3604 [-0.43042053 -0.84246611]
    4439 [ 0.98657214 -0.63854505]
    4726 [-0.60443905  0.40909158]
    4939 [0.65292146 0.42017089]
    3385 [ 0.13075714 -0.36623706]
    3603 [-0.54196761 -0.93166067]
    3858 [ 0.35658499 -0.98843815]
    3870 [-0.46706463 -0.32697341]
    4647 [-0.2581715 -0.2921683]
    4843 [ 0.62954507 -0.12838837]
    4615 [-0.73751829  0.53552492]
    4885 [-0.65186126 -0.70120301]
    3208 [0.56214832 0.07903358]
    3352 [ 0.20372567 -0.45045819]
    4718 [-0.3587201  -0.63778244]
    4748 [-0.01196884 -0.34606923]
    4745 [-0.26600972  0.1867921 ]
    4880 [-0.35029209 -0.10003265]
    4808 [0.66316222 0.10085703]
    4813 [-0.59598846  0.51976544]
    4769 [-0.73748164  0.61974917]
    4889 [ 0.57774798 -0.28866064]
    3087 [ 0.38927575 -0.68587355]
    3288 [-0.5653826  -0.04001333]
    3444 [ 0.60914588 -0.44415106]
    3960 [-0.91869084 -0.2051504 ]
    4238 [-0.57958814  0.63368793]
    4349 [0.71878843 0.41860997]
    3218 [ 0.53730553 -0.4789165 ]
    3317 [-0.11739549 -0.97328565]
    3376 [ 0.17954889 -0.3172583 ]
    3543 [-0.97500039  0.47043354]
    3738 [-0.05185838  0.16433736]
    4447 [-0.78803041  0.34308919]
    4498 [0.75384166 0.0719807 ]
    3282 [0.42731973 0.30582445]
    3678 [-0.55958508  0.54555064]
    3707 [-0.82825878 -0.58935881]
    3787 [ 0.27045633 -0.46377074]
    4332 [-0.68767634 -0.38785324]
    4601 [ 0.85539255 -0.7457105 ]
    4730 [0.73168567 0.65734323]
    4734 [-0.94018747  0.1596678 ]
    4659 [-0.42535912  0.0442724 ]
    4662 [-0.14867877  0.64240364]
    4983 [ 0.08495983 -0.88551527]
    4986 [0.86512016 0.86489761]
    3800 [-0.13561459 -0.25025789]
    4144 [-0.736182   0.5850767]
    4481 [ 0.74311269 -0.75189765]
    4826 [-0.02920009  0.81298616]
    4870 [-0.15813713 -0.48218549]
    4762 [-0.52956211  0.62101788]
    4763 [-0.80303208 -0.88032828]
    3293 [-0.20802869  0.85122242]
    3512 [-0.2121098  0.6799114]
    4045 [-0.26091064  0.36995697]
    4228 [0.1256863  0.47398418]
    4827 [0.18771618 0.20424482]
    4829 [0.62446465 0.75625044]
    4610 [-0.74148599  0.22707357]
    4735 [-0.76776014  0.89464478]
    3690 [-0.05248942 -0.69199414]
    3767 [-0.83767043 -0.60667575]
    4392 [-0.96046906 -0.19105038]
    4625 [0.92580083 0.76521529]
    4883 [ 0.34459389 -0.02107828]
    3529 [-0.41875714  0.33629189]
    3614 [ 0.9201699  -0.94009933]
    3730 [0.08569959 0.83643425]
    3879 [ 0.37977107 -0.42491136]
    3906 [-0.39260956 -0.52130444]
    4005 [-0.4660635   0.49859716]
    4233 [0.12660315 0.73191049]
    4252 [0.65058265 0.80199224]
    4290 [0.70685331 0.69211689]
    4354 [-0.10552472  0.62846813]
    4788 [0.14977058 0.07284915]
    4834 [0.51468471 0.6371852 ]
    3078 [-0.54146352 -0.54201087]
    3162 [ 0.71994177 -0.40121656]
    3436 [-0.36839863  0.67123832]
    3469 [-0.61219418 -0.14143739]
    3567 [-0.82529727 -0.4202689 ]
    4194 [0.99880854 0.81030019]
    4267 [0.39991605 0.67844941]
    4603 [-0.96455469  0.56665641]
    4652 [-0.00846154 -0.79553972]
    4731 [-0.70824885  0.92033274]
    4976 [-0.08597983  0.13694158]
    4636 [0.56136576 0.2711896 ]
    4807 [-0.2544723   0.91379008]
    4789 [-0.27528017 -0.18390174]
    4846 [-0.29808507 -0.28585688]
    4743 [ 0.02684483 -0.12364556]
    4868 [-0.78808656 -0.32666867]
    3193 [-0.06854027 -0.17807716]
    3431 [-0.23133136 -0.64581471]
    3500 [-0.84262122  0.82364748]
    3030 [ 0.40311867 -0.21029159]
    3243 [ 0.85955516 -0.73248954]
    3609 [-0.90408883  0.39369518]
    3610 [-0.49682803 -0.1048443 ]
    3801 [ 0.11776002 -0.52272868]
    4338 [-0.90423538 -0.92037215]
    4588 [-0.15713708 -0.68770481]
    4878 [ 0.42479657 -0.26534507]
    3138 [0.22313656 0.14937802]
    3342 [-0.58874714  0.10713971]
    3941 [-0.53546001 -0.74506954]
    3394 [ 0.42658872 -0.26293702]
    3944 [-0.18475802  0.99916521]
    3956 [0.96689583 0.79114421]
    4701 [ 0.40855162 -0.80496292]
    4887 [-0.1728207   0.53262792]
    4574 [-0.47876922  0.50483193]
    4867 [-0.92366525 -0.83094639]
    3365 [-0.64078489  0.66094366]
    3852 [-0.73559921  0.31331204]
    4356 [-0.87872374 -0.78899989]
    4576 [0.77540838 0.45846795]
    4824 [0.43038636 0.41264046]
    4905 [0.92745456 0.25183729]
    4927 [-0.23509697 -0.31387647]
    4773 [ 0.79671128 -0.20664385]
    4919 [ 0.4388411  -0.37706324]
    4675 [ 0.07049094 -0.61791024]
    4760 [ 0.27367841 -0.33554027]
    4886 [-0.9027495   0.28863445]
    4849 [-0.40673479 -0.09784668]
    4996 [-0.93324051  0.60063873]
    4563 [ 0.77692842 -0.25665672]
    4946 [0.25070596 0.15298544]
    3217 [-0.97803552  0.66011387]
    3379 [-0.54773207  0.11256368]
    4417 [-0.03534757  0.80709515]
    4486 [0.87059571 0.37530031]
    4562 [-0.65816732 -0.6527398 ]
    4795 [0.40745022 0.83373416]
    3703 [-0.16160908  0.6137326 ]
    4362 [0.69573669 0.78007523]
    3060 [-0.16624129  0.63154659]
    3068 [ 0.22509234 -0.47215323]
    3348 [-0.76256047 -0.21671387]
    3505 [0.66486156 0.20628663]
    3915 [ 0.42560269 -0.85506307]
    3936 [0.28141587 0.55986051]
    4136 [ 0.69691813 -0.56984953]
    4153 [0.51996457 0.67442635]
    4244 [ 0.90410822 -0.25371035]
    4365 [-0.04146938  0.37702478]
    3631 [-0.88136197  0.38089562]
    4465 [ 0.37208864 -0.7316791 ]
    4945 [ 0.8734712 -0.7412094]
    4989 [-0.7500131   0.54203052]
    4561 [ 0.99478077 -0.49772307]
    4764 [ 0.11756996 -0.57184979]
    3191 [-0.77486535  0.85879784]
    3460 [-0.73093689  0.5644405 ]
    4124 [ 0.22479023 -0.87292505]
    4224 [ 0.16220694 -0.01194152]
    4395 [-0.55369099 -0.39028115]
    4396 [ 0.3114966  -0.18403057]
    3074 [-0.4019957   0.36331487]
    3177 [ 0.42948676 -0.11806328]
    3044 [-0.60230863 -0.53704361]
    3126 [0.09296313 0.95517903]
    3407 [ 0.88954875 -0.25633875]
    3540 [0.94500282 0.88920969]
    4001 [0.31782198 0.93338312]
    4378 [0.97934029 0.3189182 ]
    4436 [-0.53776759 -0.21573922]
    3142 [0.9983642  0.96315805]
    4405 [-0.86233918  0.45478013]
    4720 [-0.70003986 -0.20496442]
    4740 [0.17426227 0.6093581 ]
    3917 [-0.07732332 -0.59343673]
    3971 [0.00838061 0.09068325]
    4355 [ 0.26728183 -0.31981475]
    4598 [ 0.86553857 -0.11467457]
    4910 [-0.26845976 -0.59156676]
    3061 [0.15667325 0.4333029 ]
    3153 [ 0.52982588 -0.00735777]
    3154 [ 0.31708736 -0.02917978]
    3204 [ 0.63108339 -0.81433931]
    3370 [ 0.18313888 -0.58517146]
    3457 [-0.91716903 -0.05644253]
    3593 [-0.75741875 -0.37187845]
    3649 [0.952136   0.17463329]
    3656 [-0.9089708   0.81999709]
    4141 [0.3914831  0.61548138]
    4291 [-0.91568387 -0.55802475]
    4321 [0.10694887 0.81681711]
    4532 [-0.8546516  -0.51857889]
    4212 [-0.66467069 -0.41896372]
    4266 [ 0.71450672 -0.99223834]
    3390 [0.4011436  0.99573196]
    4163 [-0.25758609 -0.05790198]
    4833 [0.97784429 0.75794338]
    4860 [-0.58977468  0.61615468]
    4925 [ 0.13666404 -0.98097802]
    3062 [ 0.48548288 -0.11995141]
    3548 [-0.58259562  0.62020689]
    4275 [0.94999305 0.21168262]
    3918 [0.50119935 0.62555777]
    4283 [ 0.32185558 -0.29388798]
    3636 [0.06136464 0.68166918]
    3736 [ 0.47210975 -0.13581696]
    3825 [-0.50508003 -0.29618033]
    3827 [ 0.16460668 -0.66970659]
    4422 [ 0.40234251 -0.96822475]
    3666 [-0.90796705 -0.10615397]
    4497 [0.99192322 0.87301045]
    3115 [0.07032736 0.69448424]
    3295 [-0.36585246 -0.24268267]
    3579 [ 0.11203602 -0.81523507]
    3621 [ 0.22513924 -0.63838667]
    3933 [0.82940165 0.94601488]
    4150 [-0.13671518 -0.506086  ]
    3155 [ 0.35509653 -0.83856932]
    3701 [ 0.92397926 -0.721796  ]
    3748 [0.29005942 0.96712244]
    3849 [ 0.14157863 -0.20440311]
    3897 [ 0.47921425 -0.30801178]
    4480 [-0.14958957  0.4715774 ]
    3417 [ 0.76060944 -0.80290422]
    3456 [0.94232265 0.23997801]
    3989 [ 0.46730405 -0.50472221]
    4128 [ 0.89913332 -0.87128295]
    4142 [-0.69745002 -0.27079199]
    3037 [-0.68348321 -0.24801034]
    3270 [ 0.8216502  -0.05899689]
    3308 [ 0.31994865 -0.65985131]
    3519 [ 0.06725527 -0.49591736]
    3716 [0.1963899  0.46968139]
    3718 [ 0.60298452 -0.379235  ]
    3742 [-0.42891873 -0.31986808]
    3778 [-0.4136138   0.82790662]
    3856 [-0.48597634  0.77637024]
    3911 [-0.45615539 -0.3028542 ]
    4192 [0.94178988 0.35175477]
    4246 [-0.3533115  -0.11884224]
    4272 [-0.40466513  0.67215969]
    4313 [-0.74683796 -0.67162465]
    4333 [ 0.48842782 -0.73508825]
    4528 [-0.03331998  0.32856537]
    3485 [ 0.24782144 -0.0996596 ]
    3922 [ 0.39094446 -0.69240344]
    4277 [-0.14181436  0.05694525]
    3381 [0.56684997 0.27502282]
    3384 [-0.0934669  -0.05961059]
    3648 [-0.61498726  0.76333745]
    4324 [ 0.93065133 -0.91934273]
    4729 [-0.19383796 -0.5199996 ]
    4956 [-0.17923896  0.278094  ]
    2087 [ 0.98195938 -0.08763889]
    2336 [-0.39472306 -0.88106983]
    3284 [-0.18157361  0.62998102]
    3483 [-0.15081194  0.05784893]
    3528 [0.86195316 0.4935329 ]
    3655 [0.54034964 0.28980814]
    3854 [0.05212252 0.58863683]
    3954 [ 0.3999689  -0.23063471]
    4215 [0.22383315 0.25854397]
    3070 [0.22767285 0.61103585]
    4535 [-0.6692891   0.90461765]
    3802 [ 0.01963117 -0.48282015]
    4289 [0.7079454  0.06682188]
    3145 [0.08615065 0.74121479]
    3164 [0.52069818 0.49393326]
    3499 [ 0.33994721 -0.00547555]
    3553 [0.99457556 0.46872969]
    3683 [0.43912246 0.29297898]
    3768 [-0.07924864 -0.27332552]
    3901 [ 0.11308829 -0.64242461]
    4135 [-0.1825273   0.82540306]
    4171 [-0.61193213  0.93948132]
    4681 [0.72600269 0.42526872]
    4741 [ 0.35891529 -0.61149436]
    3711 [0.90478119 0.51977586]
    3882 [-0.58210164 -0.03750133]
    4003 [ 0.04081386 -0.27906704]
    3415 [-0.91303997 -0.54116286]
    4301 [0.50014508 0.88753336]
    4314 [-0.37532238  0.00062746]
    4357 [-0.19043552 -0.73235013]
    3179 [0.53490305 0.20448241]
    3302 [-0.0869853   0.90998807]
    3383 [-0.18694107  0.13331062]
    4193 [-0.66466428 -0.28625751]
    3526 [0.31367132 0.01585891]
    3775 [-0.20477307 -0.39505284]
    4358 [0.84677592 0.83656098]
    3841 [ 0.74485401 -0.68056104]
    3877 [-0.4283381  -0.94225453]
    3888 [0.74397651 0.15646339]
    4149 [0.77049218 0.99686054]
    4388 [-0.88132525  0.98891515]
    4495 [ 0.45624717 -0.28781148]
    3108 [-0.75383614 -0.19050765]
    3320 [0.41383295 0.99926045]
    3332 [0.07826366 0.67603769]
    3537 [ 0.28395923 -0.72122613]
    3633 [-0.43755278  0.14126091]
    3747 [-0.61458354  0.48961521]
    3798 [-0.27114255 -0.93915011]
    4017 [-0.12301022 -0.42731661]
    4273 [-0.3824555  -0.88793224]
    3166 [0.34742617 0.96331723]
    3450 [-0.80629959 -0.28624856]
    3909 [-0.28703826  0.06580046]
    3935 [0.90138286 0.9214226 ]
    4269 [-0.36502124  0.19566491]
    3660 [0.14117841 0.1797883 ]
    4287 [ 0.18989549 -0.78186443]
    4322 [ 0.81139464 -0.32493184]
    4533 [ 0.16358761 -0.28959602]
    3283 [ 0.10514618 -0.05325304]
    3665 [ 0.8233538  -0.15274349]
    3679 [ 0.94357845 -0.04465503]
    3866 [-0.42414578  0.3023681 ]
    4458 [0.62115826 0.46413127]
    3498 [0.05778177 0.74967736]
    3735 [0.46171115 0.80917799]
    3511 [ 0.98946345 -0.96600127]
    3560 [ 0.69978    -0.43903546]
    3774 [ 0.05629016 -0.86127698]
    3899 [-0.45635497  0.2913166 ]
    3221 [ 0.09862568 -0.86104688]
    3850 [ 0.37022546 -0.76804566]
    3861 [-0.3863402  -0.56343757]
    4600 [-0.84431981 -0.34831011]
    4936 [-0.05195653  0.96280485]
    3715 [-0.03941067 -0.92431756]
    3766 [ 0.32216814 -0.26326834]
    4466 [ 0.72517629 -0.10474219]
    3059 [ 0.21218686 -0.42980154]
    3369 [-0.36238362  0.86254183]
    3373 [-0.70019084  0.52230677]
    3670 [0.50443362 0.11900626]
    3832 [-0.59336549  0.89135151]
    4350 [-0.98669897  0.51422386]
    3098 [ 0.72092399 -0.36952564]
    3942 [0.3575609  0.81409803]
    3677 [-0.06340268 -0.32827427]
    3693 [0.1117909  0.34204573]
    4407 [ 0.7336184  -0.17866716]
    3203 [-0.70525291  0.8171444 ]
    3503 [-0.3551913  -0.61930597]
    4302 [ 0.08829252 -0.74814942]
    3583 [-0.09653308 -0.09750654]
    3953 [-0.34862318 -0.02711016]
    3024 [-0.46905411  0.93977034]
    3424 [-0.70970272 -0.27927367]
    3826 [-0.35299485 -0.083738  ]
    3964 [ 0.2766346  -0.79972144]
    4170 [0.10799789 0.31893396]
    4236 [-0.31123621  0.67208316]
    4295 [ 0.63331387 -0.07617358]
    4421 [ 0.0505067  -0.02328959]
    4515 [0.12036412 0.08185513]
    3223 [-0.29078908  0.42767299]
    3400 [-0.64123847 -0.49869044]
    3414 [ 0.9218675  -0.72381418]
    3198 [-0.87518935  0.76795821]
    3577 [-0.20445424  0.52870998]
    3797 [0.00484254 0.9852143 ]
    3816 [-0.87272877 -0.02436954]
    4125 [-0.14162964 -0.36593132]
    4257 [0.48549615 0.25329862]
    4346 [ 0.35534084 -0.86863862]
    4364 [-0.78903479 -0.48844712]
    4451 [ 0.82702314 -0.03718974]
    4513 [ 0.11678999 -0.75079   ]
    3122 [ 0.68269447 -0.46558854]
    4013 [-0.92433102  0.74549778]
    4259 [-0.85144337  0.94016124]
    4319 [-0.05083005 -0.45407325]
    4341 [-0.23564774  0.36984302]
    4482 [0.65225067 0.73848205]
    4538 [ 0.61833144 -0.91835714]
    3263 [ 0.81896224 -0.45207886]
    3372 [ 0.45349186 -0.80871869]
    4644 [0.35877424 0.81011138]
    4667 [-0.07783209 -0.42450347]
    3090 [ 0.92414308 -0.53617526]
    3287 [-0.21595458 -0.86661034]
    3782 [ 0.65104993 -0.30528005]
    4191 [ 0.38158549 -0.4746144 ]
    3641 [ 0.85765848 -0.0285267 ]
    4327 [ 0.8363687  -0.95362025]
    4546 [ 0.43348567 -0.24279406]
    3546 [0.86142783 0.22342867]
    3789 [ 0.33187945 -0.65832425]
    3056 [-0.02416343 -0.58444468]
    4462 [0.47578495 0.35926309]
    4742 [ 0.16192358 -0.36631342]
    4832 [0.70245301 0.01586195]
    4904 [ 0.47575639 -0.27186078]
    3035 [ 0.90075053 -0.88323135]
    3476 [-0.44886444 -0.75501038]
    3720 [ 0.88921037 -0.294986  ]
    3780 [-0.31701992 -0.58998076]
    4002 [0.73558321 0.08043016]
    4137 [ 0.76613895 -0.77960486]
    4138 [ 0.64524965 -0.87446678]
    4540 [0.33931369 0.83870824]
    3135 [-0.80522868  0.46960789]
    3290 [-0.83142714  0.09187261]
    3428 [-0.59771    -0.91989489]
    3937 [0.91192318 0.70041233]
    3313 [ 0.4089614  -0.51877302]
    3334 [0.82487904 0.9611386 ]
    3064 [0.93479579 0.55825234]
    3071 [-0.71966111 -0.63456708]
    3966 [-0.86731477 -0.72206421]
    4126 [0.69687279 0.61692296]
    4550 [-0.98975712 -0.48110438]
    3443 [0.58636951 0.26337105]
    3773 [0.99568879 0.69361529]
    3904 [ 0.39210158 -0.46076228]
    3171 [-0.76269226  0.10685422]
    3328 [ 0.7366004  -0.43274592]
    3835 [0.74345849 0.33364573]
    3908 [ 0.70957642 -0.82664654]
    3815 [ 0.74532475 -0.99362959]
    4330 [-0.37867987 -0.7526962 ]
    3131 [0.24567739 0.02918909]
    3200 [-0.69870175  0.49872993]
    3571 [-0.75321926 -0.54531474]
    3606 [-0.73601909 -0.9499221 ]
    4438 [ 0.93796408 -0.56315222]
    3216 [-0.69442081  0.26179571]
    3341 [0.17684475 0.72198058]
    3929 [-0.1829965  -0.67987147]
    4473 [-0.31566734 -0.89424965]
    3092 [ 0.11078013 -0.27830361]
    3792 [-0.29454537  0.99241894]
    3972 [-0.99020314  0.96334293]
    3076 [ 0.30861875 -0.66650839]
    4240 [-0.14211826  0.36838125]
    3152 [-0.81209438  0.70763161]
    3618 [-0.0965405   0.60132142]
    3865 [ 0.89880819 -0.79504713]
    4134 [ 0.45969036 -0.51700163]
    3180 [-0.81718752 -0.45742718]
    3987 [-0.24608597  0.97096467]
    4209 [0.77754745 0.39425428]
    4237 [0.67512466 0.44469638]
    3261 [0.96269777 0.94137223]
    3530 [0.40105703 0.500993  ]
    3975 [-0.49644236  0.21282784]
    3277 [-0.6080488  -0.22644799]
    3585 [0.05973749 0.01411777]
    4143 [ 0.23964717 -0.5584029 ]
    4282 [-0.1196725  -0.17187243]
    4300 [0.79294475 0.64982047]
    3952 [0.25912551 0.04386337]
    4445 [ 0.95038686 -0.03779606]
    3289 [ 0.69114383 -0.54855183]
    3435 [0.45998955 0.26598708]
    3588 [0.52674554 0.83607259]
    4437 [0.71053297 0.53100755]
    4020 [-0.89003339  0.45220293]
    4424 [ 0.87947318 -0.04559412]
    4663 [-0.73198264 -0.06560855]
    4972 [-0.4225516  -0.00455398]
    3902 [-0.40896859  0.44663245]
    4167 [-0.90194722  0.37277492]
    4381 [-0.67134893  0.99810533]
    3304 [-0.32792547  0.31298547]
    3752 [ 0.18167856 -0.68753018]
    3084 [-0.78440569 -0.60323872]
    3481 [0.31266563 0.32505241]
    3114 [-0.40194192 -0.87855715]
    3346 [-0.87452377  0.69713677]
    3494 [0.60980354 0.13694393]
    3616 [ 0.87194758 -0.84558804]
    4195 [-0.45131442 -0.12461063]
    4339 [-0.80996119 -0.66707876]
    4340 [ 0.61628545 -0.10086384]
    3314 [-0.29704362  0.91870584]
    3382 [-0.27697262 -0.72516382]
    3790 [ 0.63654318 -0.12875614]
    3872 [ 0.37387005 -0.66004581]
    4517 [0.99893835 0.6315212 ]
    3118 [ 0.80463643 -0.55193734]
    4504 [0.47677462 0.4749075 ]
    3556 [ 0.71326087 -0.27234797]
    4253 [-0.89698202  0.61938065]
    3454 [-0.29820651  0.88939573]
    3880 [-0.81552291 -0.14349785]
    3233 [ 0.62048795 -0.39811256]
    3368 [-0.99486665 -0.08796497]
    3706 [-0.36059792 -0.14142927]
    3725 [-0.15891506 -0.77009939]
    3764 [ 0.00047841 -0.23759048]
    3255 [-0.51672908  0.70090976]
    4368 [0.68556123 0.06415327]
    4079 [-0.18092837  0.17781876]
    4093 [-0.97286224  0.22659008]
    3043 [-0.40041489 -0.90174127]
    3639 [0.58425725 0.18964066]
    4468 [0.20934995 0.54674683]
    3696 [0.80253368 0.58216247]
    3891 [0.96215764 0.40754959]
    4432 [-0.62376779 -0.02354909]
    4460 [ 0.59461042 -0.19328171]
    3516 [0.42617051 0.56216768]
    3627 [-0.4630203  -0.21063809]
    3245 [-0.34052333  0.7041012 ]
    3452 [-0.15182011 -0.94676982]
    4297 [-0.29312327  0.44147509]
    3134 [0.88552613 0.28388322]
    3659 [ 0.430062   -0.64756554]
    3455 [-0.29531238  0.62827556]
    4503 [-0.16968341  0.73126124]
    3161 [ 0.96532908 -0.09607339]
    3462 [-0.61124273  0.30597944]
    4450 [0.55492836 0.53817256]
    3176 [-0.81896186 -0.59963371]
    3189 [-0.38059589 -0.82322981]
    3617 [0.20052836 0.61628671]
    3598 [ 0.7570935  -0.47456021]
    3783 [-0.72809369 -0.96331216]
    3822 [-0.26584832 -0.69191085]
    4336 [0.16837115 0.47156434]
    4397 [-0.0474948  -0.02781288]
    3318 [-0.19776036 -0.04720657]
    3506 [-0.50299954 -0.65118959]
    3761 [ 0.94551588 -0.4386438 ]
    4448 [-0.04590342  0.63111224]
    3050 [ 0.82296517 -0.03612126]
    3088 [-0.3495818  -0.50647538]
    3101 [ 0.3558828  -0.14773893]
    3361 [ 0.66861558 -0.85906863]
    3949 [ 0.8304011  -0.77251592]
    4280 [-0.54417133 -0.45224593]
    4435 [ 0.71491901 -0.56346858]
    3484 [ 0.8287521  -0.75301858]
    3676 [ 0.60549545 -0.69735357]
    2440 [-0.53957632 -0.22942914]
    2983 [-0.13332021 -0.24421727]
    3249 [0.21779611 0.31764502]
    3544 [0.79445505 0.31202144]
    3692 [ 0.71450698 -0.94629696]
    3285 [ 0.02530396 -0.92210224]
    3515 [-0.19556633 -0.15858316]
    4140 [-0.61122528 -0.50150195]
    3244 [-0.27444302 -0.98634791]
    3301 [0.19822576 0.59403236]
    4320 [-0.27704861 -0.71028403]
    3063 [-0.89455552  0.40183994]
    3167 [ 0.34234796 -0.91655032]
    3473 [0.04853236 0.79793906]
    3723 [-0.47784987  0.91296671]
    4423 [ 0.2181828  -0.36350861]
    3863 [0.54770086 0.11576876]
    4485 [ 0.52633789 -0.58828071]
    3055 [ 0.43578996 -0.23167322]
    4285 [ 0.89842813 -0.61197013]
    3419 [-0.17601893  0.90521949]
    3859 [ 0.76580736 -0.42926668]
    3300 [-0.21294827  0.66832715]
    3403 [-0.91259546 -0.56931719]
    3493 [0.65979347 0.86038941]
    3541 [ 0.22271944 -0.63726321]
    3038 [0.32760201 0.09872055]
    3310 [-0.67241358  0.85273571]
    3895 [0.07064462 0.21677937]
    3495 [-0.87292944 -0.02624118]
    3673 [0.31418632 0.34577326]
    3202 [-0.85528168  0.6180529 ]
    3501 [ 0.65774973 -0.94899523]
    4469 [-0.37236441  0.52678279]
    3509 [0.93467158 0.07482088]
    4156 [0.15877824 0.89421566]
    3392 [0.18560699 0.07058219]
    4371 [-0.96236804 -0.28261465]
    3133 [0.80856753 0.82422726]
    4323 [ 0.827168   -0.12050051]
    3184 [-0.79148927  0.31099274]
    3728 [0.34521357 0.92531626]
    3892 [-0.78851837 -0.74732676]
    3959 [-0.98405238  0.49648934]
    3113 [-0.53197032  0.66901771]
    3181 [-0.31845806 -0.15379574]
    3615 [-0.29076253 -0.21580621]
    4158 [-0.76281453  0.76995104]
    4223 [-0.17638608 -0.55210737]
    3422 [0.4494971  0.54266429]
    3502 [ 0.52175128 -0.52973193]
    3401 [-0.23860066 -0.01012478]
    4232 [0.5829964  0.82228898]
    4530 [ 0.92024538 -0.61418549]
    3680 [-0.00330652 -0.58602255]
    3788 [-0.14861414  0.94594743]
    3712 [-0.53430336  0.60321836]
    4204 [0.07334043 0.97595538]
    3357 [0.93825943 0.56341625]
    3630 [0.81592468 0.26509602]
    2311 [0.25754223 0.31700767]
    2312 [0.51160989 0.26773627]
    3340 [-0.30051849 -0.38648158]
    4345 [-0.48712613 -0.81481185]
    3886 [ 0.0395261  -0.25182748]
    4011 [-0.06039462  0.81789096]
    3434 [-0.12313875 -0.03949971]
    4014 [0.23223166 0.50072892]
    3085 [0.00899142 0.8103856 ]
    3364 [0.62748368 0.64301244]
    3520 [0.6193071  0.33203584]
    3940 [ 0.68863022 -0.69422372]
    3139 [-0.01194198  0.59450435]
    3907 [0.87716148 0.62997541]
    3881 [-0.3996043  -0.12283224]
    3993 [0.16314199 0.28382574]
    3733 [-0.67650426  0.29919234]
    4262 [0.78480325 0.34491059]
    3839 [-0.76194024  0.98090704]
    4274 [-0.28878246 -0.98801815]
    3770 [ 0.23031334 -0.06404984]
    4278 [-0.35078564  0.69999072]
    1115 [0.54874355 0.70702917]
    1150 [ 0.07202918 -0.80578216]
    
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    Cell In[53], line 1
    ----> 1 vis_loadings_matplotlib(LOADINGS, VSHORT, a=2, b=3)
    
    Cell In[51], line 28, in vis_loadings_matplotlib(LOADINGS, VSHORT, a, b, size_col, label_col, color_col)
         23     texts.append(plt.text(
         24         row[f"PC{a}"], row[f"PC{b}"], row[label_col],
         25         fontsize=9, ha='center', va='bottom'))
         27 # Reduce overlap
    ---> 28 adjust_text(texts, arrowprops=dict(arrowstyle='-', color='gray'))
         30 # Labels and layout
         31 plt.title(f"Loadings Scatter (PC{a} vs PC{b})")
    
    File ~/.local/lib/python3.11/site-packages/adjustText/__init__.py:724, in adjust_text(texts, x, y, objects, target_x, target_y, avoid_self, prevent_crossings, force_text, force_static, force_pull, force_explode, pull_threshold, expand, max_move, explode_radius, ensure_inside_axes, expand_axes, only_move, ax, min_arrow_len, time_lim, iter_lim, *args, **kwargs)
        721 while error > 0:
        722     # expand = expands[min(i, expand_steps-1)]
        723     logger.debug(step)
    --> 724     coords, error = iterate(
        725         coords,
        726         target_xy_disp_coord,
        727         static_coords,
        728         force_text=force_text,
        729         force_static=force_static,
        730         force_pull=force_pull,
        731         pull_threshold=pull_threshold,
        732         expand=expand,
        733         max_move=max_move,
        734         bbox_to_contain=ax_bbox,
        735         only_move=only_move,
        736     )
        737     if prevent_crossings:
        738         coords = remove_crossings(coords, target_xy_disp_coord, step)
    
    File ~/.local/lib/python3.11/site-packages/adjustText/__init__.py:329, in iterate(coords, target_coords, static_coords, force_text, force_static, force_pull, pull_threshold, expand, max_move, bbox_to_contain, only_move)
        315 def iterate(
        316     coords,
        317     target_coords,
       (...)
        326     only_move={"text": "xy", "static": "xy", "explode": "xy", "pull": "xy"},
        327 ):
        328     coords = random_shifts(coords, only_move.get("explode", "xy"))
    --> 329     text_shifts_x, text_shifts_y = get_shifts_texts(
        330         expand_coords(coords, expand[0], expand[1])
        331     )
        332     if static_coords.shape[0] > 0:
        333         static_shifts_x, static_shifts_y = get_shifts_extra(
        334             expand_coords(coords, expand[0], expand[1]), static_coords
        335         )
    
    File ~/.local/lib/python3.11/site-packages/adjustText/__init__.py:169, in get_shifts_texts(coords)
        165 yoverlaps = overlap_intervals(
        166     coords[:, 2], coords[:, 3], coords[:, 2], coords[:, 3]
        167 )
        168 yoverlaps = yoverlaps[yoverlaps[:, 0] != yoverlaps[:, 1]]
    --> 169 overlaps = yoverlaps[(yoverlaps[:, None] == xoverlaps).all(-1).any(-1)]
        170 if len(overlaps) == 0:
        171     return np.zeros((coords.shape[0])), np.zeros((coords.shape[0]))
    
    AttributeError: 'bool' object has no attribute 'all'
    [ ]:
     
    ​
    1/22
    This document is still loading. Only loaded content will appear in search results until the entire document loads.
    • Output View
    4567 [-0.50789139 -0.0949744 ]
    4680 [ 0.27984135 -0.85382286]
    4897 [-0.68490901  0.7587224 ]
    4961 [0.66463274 0.06770078]
    4595 [0.3497628  0.77650478]
    4861 [ 0.92218712 -0.49231247]
    4888 [0.8829542 0.6585762]
    4655 [ 0.26758952 -0.31061838]
    4727 [ 0.39000279 -0.56890268]
    4797 [ 0.00109492 -0.25192905]
    4696 [-0.63752061 -0.11531692]
    4873 [-0.34379271 -0.85557578]
    4700 [0.97469474 0.19375244]
    4722 [ 0.79768466 -0.30189782]
    4898 [-0.37117123  0.098512  ]
    3645 [-0.63608922  0.79200478]
    4369 [ 0.60135564 -0.61232354]
    4728 [-0.29478284  0.38328454]
    4968 [-0.29217292  0.0951443 ]
    4682 [0.03684512 0.08431255]
    4775 [-0.98554693 -0.18321598]
    4619 [-0.94301574  0.46140191]
    4944 [-0.57149304  0.40069713]
    4637 [-0.26096829 -0.53517483]
    4864 [-0.5752367   0.66579395]
    4075 [-0.99149413  0.31178917]
    4086 [-0.23573339 -0.81125532]
    4882 [ 0.59989225 -0.03807904]
    4890 [0.97859965 0.85489858]
    3864 [ 0.39804033 -0.04333608]
    4210 [0.30682037 0.30935263]
    4747 [0.88937664 0.37713575]
    4766 [-0.4846322  0.7575536]
    4703 [-0.10544989  0.97683313]
    4950 [-0.13955358  0.02228346]
    4670 [0.26181109 0.13170955]
    4752 [-0.37389538  0.14618569]
    4825 [-0.19691453  0.5874795 ]
    4845 [-0.89786202  0.59269186]
    2735 [0.70608525 0.4303838 ]
    2779 [ 0.64064863 -0.81163956]
    3323 [-0.02258698 -0.71013477]
    3378 [-0.37710638  0.13123947]
    3534 [ 0.43904856 -0.83977336]
    3570 [-0.81679052 -0.6808206 ]
    3844 [-0.4560666  -0.97362067]
    4164 [-0.00391237  0.12419355]
    4256 [ 0.3866296  -0.57282761]
    4499 [0.61866025 0.74134243]
    3021 [-0.07191277  0.16835636]
    3913 [0.28667312 0.40738733]
    3945 [0.26722785 0.25857102]
    4328 [ 0.90199546 -0.39376932]
    4463 [-0.38275195 -0.00836352]
    4544 [ 0.74566373 -0.1813279 ]
    3149 [-0.26844    -0.84573989]
    3608 [-0.62763055  0.87972929]
    3187 [ 0.04026056 -0.06596498]
    3272 [-0.15985991  0.41287007]
    3366 [0.68832306 0.33778308]
    3619 [-0.32852662 -0.22003518]
    3129 [0.62790639 0.00247026]
    3209 [-0.6024152   0.52891191]
    3812 [ 0.41977505 -0.73637968]
    4370 [0.69776197 0.20174958]
    3721 [ 0.7859651  -0.79226463]
    3726 [0.40953813 0.79965829]
    3977 [ 0.90448047 -0.54458018]
    4377 [-0.50428057  0.46661517]
    4464 [-0.55352149 -0.58546816]
    3367 [-0.60840893  0.23105144]
    4130 [-0.64695302  0.24551274]
    4154 [0.3451432 0.1464167]
    4203 [0.08673552 0.65646069]
    4685 [0.66283942 0.04883871]
    4778 [0.99078141 0.81483882]
    4710 [0.93864231 0.54674046]
    4995 [-0.14757475 -0.14460523]
    3804 [0.59515359 0.68140543]
    4245 [ 0.44741491 -0.01138915]
    4736 [-0.2168871  -0.92900567]
    4879 [0.87474296 0.8732179 ]
    1964 [ 0.39385679 -0.78634279]
    2326 [0.78577897 0.11897849]
    3916 [-0.18589221 -0.19002688]
    3958 [ 0.37541357 -0.27564728]
    4431 [ 0.03123482 -0.9853634 ]
    3057 [0.6829225  0.08692274]
    3475 [-0.87650613  0.20650484]
    3978 [ 0.27621783 -0.89757697]
    4000 [-0.10405637 -0.22874771]
    4249 [ 0.26397287 -0.27382007]
    4434 [-0.14029864 -0.11123084]
    4674 [ 0.42192629 -0.85248755]
    4877 [-0.05213851  0.97302352]
    4650 [0.76858214 0.64985376]
    4977 [-0.31590913 -0.09262581]
    3266 [0.87927594 0.17378943]
    4162 [ 0.14559943 -0.21933383]
    3535 [ 0.88904723 -0.93771457]
    3698 [-0.09253234  0.83753478]
    3756 [ 0.41756882 -0.51025105]
    3818 [ 0.49306199 -0.91820772]
    4375 [ 0.65973545 -0.88585915]
    4509 [ 0.48694775 -0.52715202]
    3486 [0.69723097 0.27493277]
    3587 [ 0.15457958 -0.74290148]
    3194 [0.4226593  0.72173901]
    3451 [ 0.64327421 -0.66288328]
    3979 [-0.49639661 -0.11742076]
    4874 [-0.81260931 -0.82872606]
    4875 [ 0.31601556 -0.43109877]
    4759 [ 0.5922964 -0.0205816]
    4772 [-0.86206539  0.54680175]
    3100 [0.39996781 0.29389123]
    3331 [-0.40953243 -0.93313983]
    3358 [0.03184316 0.29462138]
    4174 [ 0.18924491 -0.71714059]
    3496 [ 0.53231496 -0.62074789]
    4173 [0.91234353 0.66191411]
    4380 [ 0.41956868 -0.38251641]
    3377 [0.78401305 0.93637112]
    3521 [ 0.58079679 -0.80795023]
    3604 [-0.43042053 -0.84246611]
    4439 [ 0.98657214 -0.63854505]
    4726 [-0.60443905  0.40909158]
    4939 [0.65292146 0.42017089]
    3385 [ 0.13075714 -0.36623706]
    3603 [-0.54196761 -0.93166067]
    3858 [ 0.35658499 -0.98843815]
    3870 [-0.46706463 -0.32697341]
    4647 [-0.2581715 -0.2921683]
    4843 [ 0.62954507 -0.12838837]
    4615 [-0.73751829  0.53552492]
    4885 [-0.65186126 -0.70120301]
    3208 [0.56214832 0.07903358]
    3352 [ 0.20372567 -0.45045819]
    4718 [-0.3587201  -0.63778244]
    4748 [-0.01196884 -0.34606923]
    4745 [-0.26600972  0.1867921 ]
    4880 [-0.35029209 -0.10003265]
    4808 [0.66316222 0.10085703]
    4813 [-0.59598846  0.51976544]
    4769 [-0.73748164  0.61974917]
    4889 [ 0.57774798 -0.28866064]
    3087 [ 0.38927575 -0.68587355]
    3288 [-0.5653826  -0.04001333]
    3444 [ 0.60914588 -0.44415106]
    3960 [-0.91869084 -0.2051504 ]
    4238 [-0.57958814  0.63368793]
    4349 [0.71878843 0.41860997]
    3218 [ 0.53730553 -0.4789165 ]
    3317 [-0.11739549 -0.97328565]
    3376 [ 0.17954889 -0.3172583 ]
    3543 [-0.97500039  0.47043354]
    3738 [-0.05185838  0.16433736]
    4447 [-0.78803041  0.34308919]
    4498 [0.75384166 0.0719807 ]
    3282 [0.42731973 0.30582445]
    3678 [-0.55958508  0.54555064]
    3707 [-0.82825878 -0.58935881]
    3787 [ 0.27045633 -0.46377074]
    4332 [-0.68767634 -0.38785324]
    4601 [ 0.85539255 -0.7457105 ]
    4730 [0.73168567 0.65734323]
    4734 [-0.94018747  0.1596678 ]
    4659 [-0.42535912  0.0442724 ]
    4662 [-0.14867877  0.64240364]
    4983 [ 0.08495983 -0.88551527]
    4986 [0.86512016 0.86489761]
    3800 [-0.13561459 -0.25025789]
    4144 [-0.736182   0.5850767]
    4481 [ 0.74311269 -0.75189765]
    4826 [-0.02920009  0.81298616]
    4870 [-0.15813713 -0.48218549]
    4762 [-0.52956211  0.62101788]
    4763 [-0.80303208 -0.88032828]
    3293 [-0.20802869  0.85122242]
    3512 [-0.2121098  0.6799114]
    4045 [-0.26091064  0.36995697]
    4228 [0.1256863  0.47398418]
    4827 [0.18771618 0.20424482]
    4829 [0.62446465 0.75625044]
    4610 [-0.74148599  0.22707357]
    4735 [-0.76776014  0.89464478]
    3690 [-0.05248942 -0.69199414]
    3767 [-0.83767043 -0.60667575]
    4392 [-0.96046906 -0.19105038]
    4625 [0.92580083 0.76521529]
    4883 [ 0.34459389 -0.02107828]
    3529 [-0.41875714  0.33629189]
    3614 [ 0.9201699  -0.94009933]
    3730 [0.08569959 0.83643425]
    3879 [ 0.37977107 -0.42491136]
    3906 [-0.39260956 -0.52130444]
    4005 [-0.4660635   0.49859716]
    4233 [0.12660315 0.73191049]
    4252 [0.65058265 0.80199224]
    4290 [0.70685331 0.69211689]
    4354 [-0.10552472  0.62846813]
    4788 [0.14977058 0.07284915]
    4834 [0.51468471 0.6371852 ]
    3078 [-0.54146352 -0.54201087]
    3162 [ 0.71994177 -0.40121656]
    3436 [-0.36839863  0.67123832]
    3469 [-0.61219418 -0.14143739]
    3567 [-0.82529727 -0.4202689 ]
    4194 [0.99880854 0.81030019]
    4267 [0.39991605 0.67844941]
    4603 [-0.96455469  0.56665641]
    4652 [-0.00846154 -0.79553972]
    4731 [-0.70824885  0.92033274]
    4976 [-0.08597983  0.13694158]
    4636 [0.56136576 0.2711896 ]
    4807 [-0.2544723   0.91379008]
    4789 [-0.27528017 -0.18390174]
    4846 [-0.29808507 -0.28585688]
    4743 [ 0.02684483 -0.12364556]
    4868 [-0.78808656 -0.32666867]
    3193 [-0.06854027 -0.17807716]
    3431 [-0.23133136 -0.64581471]
    3500 [-0.84262122  0.82364748]
    3030 [ 0.40311867 -0.21029159]
    3243 [ 0.85955516 -0.73248954]
    3609 [-0.90408883  0.39369518]
    3610 [-0.49682803 -0.1048443 ]
    3801 [ 0.11776002 -0.52272868]
    4338 [-0.90423538 -0.92037215]
    4588 [-0.15713708 -0.68770481]
    4878 [ 0.42479657 -0.26534507]
    3138 [0.22313656 0.14937802]
    3342 [-0.58874714  0.10713971]
    3941 [-0.53546001 -0.74506954]
    3394 [ 0.42658872 -0.26293702]
    3944 [-0.18475802  0.99916521]
    3956 [0.96689583 0.79114421]
    4701 [ 0.40855162 -0.80496292]
    4887 [-0.1728207   0.53262792]
    4574 [-0.47876922  0.50483193]
    4867 [-0.92366525 -0.83094639]
    3365 [-0.64078489  0.66094366]
    3852 [-0.73559921  0.31331204]
    4356 [-0.87872374 -0.78899989]
    4576 [0.77540838 0.45846795]
    4824 [0.43038636 0.41264046]
    4905 [0.92745456 0.25183729]
    4927 [-0.23509697 -0.31387647]
    4773 [ 0.79671128 -0.20664385]
    4919 [ 0.4388411  -0.37706324]
    4675 [ 0.07049094 -0.61791024]
    4760 [ 0.27367841 -0.33554027]
    4886 [-0.9027495   0.28863445]
    4849 [-0.40673479 -0.09784668]
    4996 [-0.93324051  0.60063873]
    4563 [ 0.77692842 -0.25665672]
    4946 [0.25070596 0.15298544]
    3217 [-0.97803552  0.66011387]
    3379 [-0.54773207  0.11256368]
    4417 [-0.03534757  0.80709515]
    4486 [0.87059571 0.37530031]
    4562 [-0.65816732 -0.6527398 ]
    4795 [0.40745022 0.83373416]
    3703 [-0.16160908  0.6137326 ]
    4362 [0.69573669 0.78007523]
    3060 [-0.16624129  0.63154659]
    3068 [ 0.22509234 -0.47215323]
    3348 [-0.76256047 -0.21671387]
    3505 [0.66486156 0.20628663]
    3915 [ 0.42560269 -0.85506307]
    3936 [0.28141587 0.55986051]
    4136 [ 0.69691813 -0.56984953]
    4153 [0.51996457 0.67442635]
    4244 [ 0.90410822 -0.25371035]
    4365 [-0.04146938  0.37702478]
    3631 [-0.88136197  0.38089562]
    4465 [ 0.37208864 -0.7316791 ]
    4945 [ 0.8734712 -0.7412094]
    4989 [-0.7500131   0.54203052]
    4561 [ 0.99478077 -0.49772307]
    4764 [ 0.11756996 -0.57184979]
    3191 [-0.77486535  0.85879784]
    3460 [-0.73093689  0.5644405 ]
    4124 [ 0.22479023 -0.87292505]
    4224 [ 0.16220694 -0.01194152]
    4395 [-0.55369099 -0.39028115]
    4396 [ 0.3114966  -0.18403057]
    3074 [-0.4019957   0.36331487]
    3177 [ 0.42948676 -0.11806328]
    3044 [-0.60230863 -0.53704361]
    3126 [0.09296313 0.95517903]
    3407 [ 0.88954875 -0.25633875]
    3540 [0.94500282 0.88920969]
    4001 [0.31782198 0.93338312]
    4378 [0.97934029 0.3189182 ]
    4436 [-0.53776759 -0.21573922]
    3142 [0.9983642  0.96315805]
    4405 [-0.86233918  0.45478013]
    4720 [-0.70003986 -0.20496442]
    4740 [0.17426227 0.6093581 ]
    3917 [-0.07732332 -0.59343673]
    3971 [0.00838061 0.09068325]
    4355 [ 0.26728183 -0.31981475]
    4598 [ 0.86553857 -0.11467457]
    4910 [-0.26845976 -0.59156676]
    3061 [0.15667325 0.4333029 ]
    3153 [ 0.52982588 -0.00735777]
    3154 [ 0.31708736 -0.02917978]
    3204 [ 0.63108339 -0.81433931]
    3370 [ 0.18313888 -0.58517146]
    3457 [-0.91716903 -0.05644253]
    3593 [-0.75741875 -0.37187845]
    3649 [0.952136   0.17463329]
    3656 [-0.9089708   0.81999709]
    4141 [0.3914831  0.61548138]
    4291 [-0.91568387 -0.55802475]
    4321 [0.10694887 0.81681711]
    4532 [-0.8546516  -0.51857889]
    4212 [-0.66467069 -0.41896372]
    4266 [ 0.71450672 -0.99223834]
    3390 [0.4011436  0.99573196]
    4163 [-0.25758609 -0.05790198]
    4833 [0.97784429 0.75794338]
    4860 [-0.58977468  0.61615468]
    4925 [ 0.13666404 -0.98097802]
    3062 [ 0.48548288 -0.11995141]
    3548 [-0.58259562  0.62020689]
    4275 [0.94999305 0.21168262]
    3918 [0.50119935 0.62555777]
    4283 [ 0.32185558 -0.29388798]
    3636 [0.06136464 0.68166918]
    3736 [ 0.47210975 -0.13581696]
    3825 [-0.50508003 -0.29618033]
    3827 [ 0.16460668 -0.66970659]
    4422 [ 0.40234251 -0.96822475]
    3666 [-0.90796705 -0.10615397]
    4497 [0.99192322 0.87301045]
    3115 [0.07032736 0.69448424]
    3295 [-0.36585246 -0.24268267]
    3579 [ 0.11203602 -0.81523507]
    3621 [ 0.22513924 -0.63838667]
    3933 [0.82940165 0.94601488]
    4150 [-0.13671518 -0.506086  ]
    3155 [ 0.35509653 -0.83856932]
    3701 [ 0.92397926 -0.721796  ]
    3748 [0.29005942 0.96712244]
    3849 [ 0.14157863 -0.20440311]
    3897 [ 0.47921425 -0.30801178]
    4480 [-0.14958957  0.4715774 ]
    3417 [ 0.76060944 -0.80290422]
    3456 [0.94232265 0.23997801]
    3989 [ 0.46730405 -0.50472221]
    4128 [ 0.89913332 -0.87128295]
    4142 [-0.69745002 -0.27079199]
    3037 [-0.68348321 -0.24801034]
    3270 [ 0.8216502  -0.05899689]
    3308 [ 0.31994865 -0.65985131]
    3519 [ 0.06725527 -0.49591736]
    3716 [0.1963899  0.46968139]
    3718 [ 0.60298452 -0.379235  ]
    3742 [-0.42891873 -0.31986808]
    3778 [-0.4136138   0.82790662]
    3856 [-0.48597634  0.77637024]
    3911 [-0.45615539 -0.3028542 ]
    4192 [0.94178988 0.35175477]
    4246 [-0.3533115  -0.11884224]
    4272 [-0.40466513  0.67215969]
    4313 [-0.74683796 -0.67162465]
    4333 [ 0.48842782 -0.73508825]
    4528 [-0.03331998  0.32856537]
    3485 [ 0.24782144 -0.0996596 ]
    3922 [ 0.39094446 -0.69240344]
    4277 [-0.14181436  0.05694525]
    3381 [0.56684997 0.27502282]
    3384 [-0.0934669  -0.05961059]
    3648 [-0.61498726  0.76333745]
    4324 [ 0.93065133 -0.91934273]
    4729 [-0.19383796 -0.5199996 ]
    4956 [-0.17923896  0.278094  ]
    2087 [ 0.98195938 -0.08763889]
    2336 [-0.39472306 -0.88106983]
    3284 [-0.18157361  0.62998102]
    3483 [-0.15081194  0.05784893]
    3528 [0.86195316 0.4935329 ]
    3655 [0.54034964 0.28980814]
    3854 [0.05212252 0.58863683]
    3954 [ 0.3999689  -0.23063471]
    4215 [0.22383315 0.25854397]
    3070 [0.22767285 0.61103585]
    4535 [-0.6692891   0.90461765]
    3802 [ 0.01963117 -0.48282015]
    4289 [0.7079454  0.06682188]
    3145 [0.08615065 0.74121479]
    3164 [0.52069818 0.49393326]
    3499 [ 0.33994721 -0.00547555]
    3553 [0.99457556 0.46872969]
    3683 [0.43912246 0.29297898]
    3768 [-0.07924864 -0.27332552]
    3901 [ 0.11308829 -0.64242461]
    4135 [-0.1825273   0.82540306]
    4171 [-0.61193213  0.93948132]
    4681 [0.72600269 0.42526872]
    4741 [ 0.35891529 -0.61149436]
    3711 [0.90478119 0.51977586]
    3882 [-0.58210164 -0.03750133]
    4003 [ 0.04081386 -0.27906704]
    3415 [-0.91303997 -0.54116286]
    4301 [0.50014508 0.88753336]
    4314 [-0.37532238  0.00062746]
    4357 [-0.19043552 -0.73235013]
    3179 [0.53490305 0.20448241]
    3302 [-0.0869853   0.90998807]
    3383 [-0.18694107  0.13331062]
    4193 [-0.66466428 -0.28625751]
    3526 [0.31367132 0.01585891]
    3775 [-0.20477307 -0.39505284]
    4358 [0.84677592 0.83656098]
    3841 [ 0.74485401 -0.68056104]
    3877 [-0.4283381  -0.94225453]
    3888 [0.74397651 0.15646339]
    4149 [0.77049218 0.99686054]
    4388 [-0.88132525  0.98891515]
    4495 [ 0.45624717 -0.28781148]
    3108 [-0.75383614 -0.19050765]
    3320 [0.41383295 0.99926045]
    3332 [0.07826366 0.67603769]
    3537 [ 0.28395923 -0.72122613]
    3633 [-0.43755278  0.14126091]
    3747 [-0.61458354  0.48961521]
    3798 [-0.27114255 -0.93915011]
    4017 [-0.12301022 -0.42731661]
    4273 [-0.3824555  -0.88793224]
    3166 [0.34742617 0.96331723]
    3450 [-0.80629959 -0.28624856]
    3909 [-0.28703826  0.06580046]
    3935 [0.90138286 0.9214226 ]
    4269 [-0.36502124  0.19566491]
    3660 [0.14117841 0.1797883 ]
    4287 [ 0.18989549 -0.78186443]
    4322 [ 0.81139464 -0.32493184]
    4533 [ 0.16358761 -0.28959602]
    3283 [ 0.10514618 -0.05325304]
    3665 [ 0.8233538  -0.15274349]
    3679 [ 0.94357845 -0.04465503]
    3866 [-0.42414578  0.3023681 ]
    4458 [0.62115826 0.46413127]
    3498 [0.05778177 0.74967736]
    3735 [0.46171115 0.80917799]
    3511 [ 0.98946345 -0.96600127]
    3560 [ 0.69978    -0.43903546]
    3774 [ 0.05629016 -0.86127698]
    3899 [-0.45635497  0.2913166 ]
    3221 [ 0.09862568 -0.86104688]
    3850 [ 0.37022546 -0.76804566]
    3861 [-0.3863402  -0.56343757]
    4600 [-0.84431981 -0.34831011]
    4936 [-0.05195653  0.96280485]
    3715 [-0.03941067 -0.92431756]
    3766 [ 0.32216814 -0.26326834]
    4466 [ 0.72517629 -0.10474219]
    3059 [ 0.21218686 -0.42980154]
    3369 [-0.36238362  0.86254183]
    3373 [-0.70019084  0.52230677]
    3670 [0.50443362 0.11900626]
    3832 [-0.59336549  0.89135151]
    4350 [-0.98669897  0.51422386]
    3098 [ 0.72092399 -0.36952564]
    3942 [0.3575609  0.81409803]
    3677 [-0.06340268 -0.32827427]
    3693 [0.1117909  0.34204573]
    4407 [ 0.7336184  -0.17866716]
    3203 [-0.70525291  0.8171444 ]
    3503 [-0.3551913  -0.61930597]
    4302 [ 0.08829252 -0.74814942]
    3583 [-0.09653308 -0.09750654]
    3953 [-0.34862318 -0.02711016]
    3024 [-0.46905411  0.93977034]
    3424 [-0.70970272 -0.27927367]
    3826 [-0.35299485 -0.083738  ]
    3964 [ 0.2766346  -0.79972144]
    4170 [0.10799789 0.31893396]
    4236 [-0.31123621  0.67208316]
    4295 [ 0.63331387 -0.07617358]
    4421 [ 0.0505067  -0.02328959]
    4515 [0.12036412 0.08185513]
    3223 [-0.29078908  0.42767299]
    3400 [-0.64123847 -0.49869044]
    3414 [ 0.9218675  -0.72381418]
    3198 [-0.87518935  0.76795821]
    3577 [-0.20445424  0.52870998]
    3797 [0.00484254 0.9852143 ]
    3816 [-0.87272877 -0.02436954]
    4125 [-0.14162964 -0.36593132]
    4257 [0.48549615 0.25329862]
    4346 [ 0.35534084 -0.86863862]
    4364 [-0.78903479 -0.48844712]
    4451 [ 0.82702314 -0.03718974]
    4513 [ 0.11678999 -0.75079   ]
    3122 [ 0.68269447 -0.46558854]
    4013 [-0.92433102  0.74549778]
    4259 [-0.85144337  0.94016124]
    4319 [-0.05083005 -0.45407325]
    4341 [-0.23564774  0.36984302]
    4482 [0.65225067 0.73848205]
    4538 [ 0.61833144 -0.91835714]
    3263 [ 0.81896224 -0.45207886]
    3372 [ 0.45349186 -0.80871869]
    4644 [0.35877424 0.81011138]
    4667 [-0.07783209 -0.42450347]
    3090 [ 0.92414308 -0.53617526]
    3287 [-0.21595458 -0.86661034]
    3782 [ 0.65104993 -0.30528005]
    4191 [ 0.38158549 -0.4746144 ]
    3641 [ 0.85765848 -0.0285267 ]
    4327 [ 0.8363687  -0.95362025]
    4546 [ 0.43348567 -0.24279406]
    3546 [0.86142783 0.22342867]
    3789 [ 0.33187945 -0.65832425]
    3056 [-0.02416343 -0.58444468]
    4462 [0.47578495 0.35926309]
    4742 [ 0.16192358 -0.36631342]
    4832 [0.70245301 0.01586195]
    4904 [ 0.47575639 -0.27186078]
    3035 [ 0.90075053 -0.88323135]
    3476 [-0.44886444 -0.75501038]
    3720 [ 0.88921037 -0.294986  ]
    3780 [-0.31701992 -0.58998076]
    4002 [0.73558321 0.08043016]
    4137 [ 0.76613895 -0.77960486]
    4138 [ 0.64524965 -0.87446678]
    4540 [0.33931369 0.83870824]
    3135 [-0.80522868  0.46960789]
    3290 [-0.83142714  0.09187261]
    3428 [-0.59771    -0.91989489]
    3937 [0.91192318 0.70041233]
    3313 [ 0.4089614  -0.51877302]
    3334 [0.82487904 0.9611386 ]
    3064 [0.93479579 0.55825234]
    3071 [-0.71966111 -0.63456708]
    3966 [-0.86731477 -0.72206421]
    4126 [0.69687279 0.61692296]
    4550 [-0.98975712 -0.48110438]
    3443 [0.58636951 0.26337105]
    3773 [0.99568879 0.69361529]
    3904 [ 0.39210158 -0.46076228]
    3171 [-0.76269226  0.10685422]
    3328 [ 0.7366004  -0.43274592]
    3835 [0.74345849 0.33364573]
    3908 [ 0.70957642 -0.82664654]
    3815 [ 0.74532475 -0.99362959]
    4330 [-0.37867987 -0.7526962 ]
    3131 [0.24567739 0.02918909]
    3200 [-0.69870175  0.49872993]
    3571 [-0.75321926 -0.54531474]
    3606 [-0.73601909 -0.9499221 ]
    4438 [ 0.93796408 -0.56315222]
    3216 [-0.69442081  0.26179571]
    3341 [0.17684475 0.72198058]
    3929 [-0.1829965  -0.67987147]
    4473 [-0.31566734 -0.89424965]
    3092 [ 0.11078013 -0.27830361]
    3792 [-0.29454537  0.99241894]
    3972 [-0.99020314  0.96334293]
    3076 [ 0.30861875 -0.66650839]
    4240 [-0.14211826  0.36838125]
    3152 [-0.81209438  0.70763161]
    3618 [-0.0965405   0.60132142]
    3865 [ 0.89880819 -0.79504713]
    4134 [ 0.45969036 -0.51700163]
    3180 [-0.81718752 -0.45742718]
    3987 [-0.24608597  0.97096467]
    4209 [0.77754745 0.39425428]
    4237 [0.67512466 0.44469638]
    3261 [0.96269777 0.94137223]
    3530 [0.40105703 0.500993  ]
    3975 [-0.49644236  0.21282784]
    3277 [-0.6080488  -0.22644799]
    3585 [0.05973749 0.01411777]
    4143 [ 0.23964717 -0.5584029 ]
    4282 [-0.1196725  -0.17187243]
    4300 [0.79294475 0.64982047]
    3952 [0.25912551 0.04386337]
    4445 [ 0.95038686 -0.03779606]
    3289 [ 0.69114383 -0.54855183]
    3435 [0.45998955 0.26598708]
    3588 [0.52674554 0.83607259]
    4437 [0.71053297 0.53100755]
    4020 [-0.89003339  0.45220293]
    4424 [ 0.87947318 -0.04559412]
    4663 [-0.73198264 -0.06560855]
    4972 [-0.4225516  -0.00455398]
    3902 [-0.40896859  0.44663245]
    4167 [-0.90194722  0.37277492]
    4381 [-0.67134893  0.99810533]
    3304 [-0.32792547  0.31298547]
    3752 [ 0.18167856 -0.68753018]
    3084 [-0.78440569 -0.60323872]
    3481 [0.31266563 0.32505241]
    3114 [-0.40194192 -0.87855715]
    3346 [-0.87452377  0.69713677]
    3494 [0.60980354 0.13694393]
    3616 [ 0.87194758 -0.84558804]
    4195 [-0.45131442 -0.12461063]
    4339 [-0.80996119 -0.66707876]
    4340 [ 0.61628545 -0.10086384]
    3314 [-0.29704362  0.91870584]
    3382 [-0.27697262 -0.72516382]
    3790 [ 0.63654318 -0.12875614]
    3872 [ 0.37387005 -0.66004581]
    4517 [0.99893835 0.6315212 ]
    3118 [ 0.80463643 -0.55193734]
    4504 [0.47677462 0.4749075 ]
    3556 [ 0.71326087 -0.27234797]
    4253 [-0.89698202  0.61938065]
    3454 [-0.29820651  0.88939573]
    3880 [-0.81552291 -0.14349785]
    3233 [ 0.62048795 -0.39811256]
    3368 [-0.99486665 -0.08796497]
    3706 [-0.36059792 -0.14142927]
    3725 [-0.15891506 -0.77009939]
    3764 [ 0.00047841 -0.23759048]
    3255 [-0.51672908  0.70090976]
    4368 [0.68556123 0.06415327]
    4079 [-0.18092837  0.17781876]
    4093 [-0.97286224  0.22659008]
    3043 [-0.40041489 -0.90174127]
    3639 [0.58425725 0.18964066]
    4468 [0.20934995 0.54674683]
    3696 [0.80253368 0.58216247]
    3891 [0.96215764 0.40754959]
    4432 [-0.62376779 -0.02354909]
    4460 [ 0.59461042 -0.19328171]
    3516 [0.42617051 0.56216768]
    3627 [-0.4630203  -0.21063809]
    3245 [-0.34052333  0.7041012 ]
    3452 [-0.15182011 -0.94676982]
    4297 [-0.29312327  0.44147509]
    3134 [0.88552613 0.28388322]
    3659 [ 0.430062   -0.64756554]
    3455 [-0.29531238  0.62827556]
    4503 [-0.16968341  0.73126124]
    3161 [ 0.96532908 -0.09607339]
    3462 [-0.61124273  0.30597944]
    4450 [0.55492836 0.53817256]
    3176 [-0.81896186 -0.59963371]
    3189 [-0.38059589 -0.82322981]
    3617 [0.20052836 0.61628671]
    3598 [ 0.7570935  -0.47456021]
    3783 [-0.72809369 -0.96331216]
    3822 [-0.26584832 -0.69191085]
    4336 [0.16837115 0.47156434]
    4397 [-0.0474948  -0.02781288]
    3318 [-0.19776036 -0.04720657]
    3506 [-0.50299954 -0.65118959]
    3761 [ 0.94551588 -0.4386438 ]
    4448 [-0.04590342  0.63111224]
    3050 [ 0.82296517 -0.03612126]
    3088 [-0.3495818  -0.50647538]
    3101 [ 0.3558828  -0.14773893]
    3361 [ 0.66861558 -0.85906863]
    3949 [ 0.8304011  -0.77251592]
    4280 [-0.54417133 -0.45224593]
    4435 [ 0.71491901 -0.56346858]
    3484 [ 0.8287521  -0.75301858]
    3676 [ 0.60549545 -0.69735357]
    2440 [-0.53957632 -0.22942914]
    2983 [-0.13332021 -0.24421727]
    3249 [0.21779611 0.31764502]
    3544 [0.79445505 0.31202144]
    3692 [ 0.71450698 -0.94629696]
    3285 [ 0.02530396 -0.92210224]
    3515 [-0.19556633 -0.15858316]
    4140 [-0.61122528 -0.50150195]
    3244 [-0.27444302 -0.98634791]
    3301 [0.19822576 0.59403236]
    4320 [-0.27704861 -0.71028403]
    3063 [-0.89455552  0.40183994]
    3167 [ 0.34234796 -0.91655032]
    3473 [0.04853236 0.79793906]
    3723 [-0.47784987  0.91296671]
    4423 [ 0.2181828  -0.36350861]
    3863 [0.54770086 0.11576876]
    4485 [ 0.52633789 -0.58828071]
    3055 [ 0.43578996 -0.23167322]
    4285 [ 0.89842813 -0.61197013]
    3419 [-0.17601893  0.90521949]
    3859 [ 0.76580736 -0.42926668]
    3300 [-0.21294827  0.66832715]
    3403 [-0.91259546 -0.56931719]
    3493 [0.65979347 0.86038941]
    3541 [ 0.22271944 -0.63726321]
    3038 [0.32760201 0.09872055]
    3310 [-0.67241358  0.85273571]
    3895 [0.07064462 0.21677937]
    3495 [-0.87292944 -0.02624118]
    3673 [0.31418632 0.34577326]
    3202 [-0.85528168  0.6180529 ]
    3501 [ 0.65774973 -0.94899523]
    4469 [-0.37236441  0.52678279]
    3509 [0.93467158 0.07482088]
    4156 [0.15877824 0.89421566]
    3392 [0.18560699 0.07058219]
    4371 [-0.96236804 -0.28261465]
    3133 [0.80856753 0.82422726]
    4323 [ 0.827168   -0.12050051]
    3184 [-0.79148927  0.31099274]
    3728 [0.34521357 0.92531626]
    3892 [-0.78851837 -0.74732676]
    3959 [-0.98405238  0.49648934]
    3113 [-0.53197032  0.66901771]
    3181 [-0.31845806 -0.15379574]
    3615 [-0.29076253 -0.21580621]
    4158 [-0.76281453  0.76995104]
    4223 [-0.17638608 -0.55210737]
    3422 [0.4494971  0.54266429]
    3502 [ 0.52175128 -0.52973193]
    3401 [-0.23860066 -0.01012478]
    4232 [0.5829964  0.82228898]
    4530 [ 0.92024538 -0.61418549]
    3680 [-0.00330652 -0.58602255]
    3788 [-0.14861414  0.94594743]
    3712 [-0.53430336  0.60321836]
    4204 [0.07334043 0.97595538]
    3357 [0.93825943 0.56341625]
    3630 [0.81592468 0.26509602]
    2311 [0.25754223 0.31700767]
    2312 [0.51160989 0.26773627]
    3340 [-0.30051849 -0.38648158]
    4345 [-0.48712613 -0.81481185]
    3886 [ 0.0395261  -0.25182748]
    4011 [-0.06039462  0.81789096]
    3434 [-0.12313875 -0.03949971]
    4014 [0.23223166 0.50072892]
    3085 [0.00899142 0.8103856 ]
    3364 [0.62748368 0.64301244]
    3520 [0.6193071  0.33203584]
    3940 [ 0.68863022 -0.69422372]
    3139 [-0.01194198  0.59450435]
    3907 [0.87716148 0.62997541]
    3881 [-0.3996043  -0.12283224]
    3993 [0.16314199 0.28382574]
    3733 [-0.67650426  0.29919234]
    4262 [0.78480325 0.34491059]
    3839 [-0.76194024  0.98090704]
    4274 [-0.28878246 -0.98801815]
    3770 [ 0.23031334 -0.06404984]
    4278 [-0.35078564  0.69999072]
    1115 [0.54874355 0.70702917]
    1150 [ 0.07202918 -0.80578216]
    
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    Cell In[53], line 1
    ----> 1 vis_loadings_matplotlib(LOADINGS, VSHORT, a=2, b=3)
    
    Cell In[51], line 28, in vis_loadings_matplotlib(LOADINGS, VSHORT, a, b, size_col, label_col, color_col)
         23     texts.append(plt.text(
         24         row[f"PC{a}"], row[f"PC{b}"], row[label_col],
         25         fontsize=9, ha='center', va='bottom'))
         27 # Reduce overlap
    ---> 28 adjust_text(texts, arrowprops=dict(arrowstyle='-', color='gray'))
         30 # Labels and layout
         31 plt.title(f"Loadings Scatter (PC{a} vs PC{b})")
    
    File ~/.local/lib/python3.11/site-packages/adjustText/__init__.py:724, in adjust_text(texts, x, y, objects, target_x, target_y, avoid_self, prevent_crossings, force_text, force_static, force_pull, force_explode, pull_threshold, expand, max_move, explode_radius, ensure_inside_axes, expand_axes, only_move, ax, min_arrow_len, time_lim, iter_lim, *args, **kwargs)
        721 while error > 0:
        722     # expand = expands[min(i, expand_steps-1)]
        723     logger.debug(step)
    --> 724     coords, error = iterate(
        725         coords,
        726         target_xy_disp_coord,
        727         static_coords,
        728         force_text=force_text,
        729         force_static=force_static,
        730         force_pull=force_pull,
        731         pull_threshold=pull_threshold,
        732         expand=expand,
        733         max_move=max_move,
        734         bbox_to_contain=ax_bbox,
        735         only_move=only_move,
        736     )
        737     if prevent_crossings:
        738         coords = remove_crossings(coords, target_xy_disp_coord, step)
    
    File ~/.local/lib/python3.11/site-packages/adjustText/__init__.py:329, in iterate(coords, target_coords, static_coords, force_text, force_static, force_pull, pull_threshold, expand, max_move, bbox_to_contain, only_move)
        315 def iterate(
        316     coords,
        317     target_coords,
       (...)
        326     only_move={"text": "xy", "static": "xy", "explode": "xy", "pull": "xy"},
        327 ):
        328     coords = random_shifts(coords, only_move.get("explode", "xy"))
    --> 329     text_shifts_x, text_shifts_y = get_shifts_texts(
        330         expand_coords(coords, expand[0], expand[1])
        331     )
        332     if static_coords.shape[0] > 0:
        333         static_shifts_x, static_shifts_y = get_shifts_extra(
        334             expand_coords(coords, expand[0], expand[1]), static_coords
        335         )
    
    File ~/.local/lib/python3.11/site-packages/adjustText/__init__.py:169, in get_shifts_texts(coords)
        165 yoverlaps = overlap_intervals(
        166     coords[:, 2], coords[:, 3], coords[:, 2], coords[:, 3]
        167 )
        168 yoverlaps = yoverlaps[yoverlaps[:, 0] != yoverlaps[:, 1]]
    --> 169 overlaps = yoverlaps[(yoverlaps[:, None] == xoverlaps).all(-1).any(-1)]
        170 if len(overlaps) == 0:
        171     return np.zeros((coords.shape[0])), np.zeros((coords.shape[0]))
    
    AttributeError: 'bool' object has no attribute 'all'
    No properties to inspect.

    Kernel usage not available

    Switch to a notebook or console to see kernel usage details.

    -

    Variables

    Callstack

      Breakpoints

      Source

      xxxxxxxxxx
      1

      Kernel Sources

        0
        3
        No Kernel
        Mem: 3.60 GB
        Saving completed
        Uploading…

        1
        Models.ipynb
        Spaces: 4
        Ln 1, Col 1
        Mode: Command
        • Console
        • Change Kernel…
        • Clear Console Cells
        • Close and Shut Down…
        • Insert Line Break
        • Interrupt Kernel
        • New Console
        • Restart Kernel…
        • Run Cell (forced)
        • Run Cell (unforced)
        • Show All Kernel Activity
        • Debugger
        • Enable / Disable pausing on exceptions
          Enable / Disable pausing on exceptions
        • Evaluate Code
          Evaluate Code
        • Next
          Next
          F10
        • Pause
          Pause
          F9
        • Step In
          Step In
          F11
        • Step Out
          Step Out
          Shift+F11
        • Terminate
          Terminate
          Shift+F9
        • Extension Manager
        • Enable Extension Manager
        • File Operations
        • Autosave Documents
        • Download
          Download the file to your computer
        • Open from Path…
          Open from path
        • Open from URL…
          Open from URL
        • Reload Notebook from Disk
          Reload contents from disk
        • Revert Notebook to Checkpoint
          Revert contents to previous checkpoint
        • Save Notebook
          Save and create checkpoint
          Ctrl+S
        • Save Notebook As…
          Save with new path
          Ctrl+Shift+S
        • Show Active File in File Browser
        • Trust HTML File
          Whether the HTML file is trusted. Trusting the file allows scripts to run in it, which may result in security risks. Only enable for files you trust.
        • Help
        • About JupyterLab
        • Jupyter Forum
        • Jupyter Reference
        • JupyterLab FAQ
        • JupyterLab Reference
        • Launch Classic Notebook
        • Licenses
        • Markdown Reference
        • Reset Application State
        • Image Viewer
        • Flip image horizontally
          H
        • Flip image vertically
          V
        • Invert Colors
          I
        • Reset Image
          0
        • Rotate Clockwise
          ]
        • Rotate Counterclockwise
          [
        • Zoom In
          =
        • Zoom Out
          -
        • Kernel Operations
        • Shut Down All Kernels…
        • Kernel Resource
        • Kernel Usage
          Kernel Usage
        • Launcher
        • New Launcher
        • Main Area
        • Activate Next Tab
          Ctrl+Shift+]
        • Activate Next Tab Bar
          Ctrl+Shift+.
        • Activate Previous Tab
          Ctrl+Shift+[
        • Activate Previous Tab Bar
          Ctrl+Shift+,
        • Activate Previously Used Tab
          Ctrl+Shift+'
        • Close All Other Tabs
        • Close All Tabs
        • Close Tab
          Alt+W
        • Close Tabs to Right
        • Find Next
          Ctrl+G
        • Find Previous
          Ctrl+Shift+G
        • Find…
          Ctrl+F
        • Log Out
          Log out of JupyterLab
        • Presentation Mode
        • Show Header Above Content
        • Show Left Sidebar
          Ctrl+B
        • Show Log Console
        • Show Right Sidebar
        • Show Status Bar
        • Shut Down
          Shut down JupyterLab
        • Simple Interface
          Ctrl+Shift+D
        • Notebook Cell Operations
        • Change to Code Cell Type
          Y
        • Change to Heading 1
          1
        • Change to Heading 2
          2
        • Change to Heading 3
          3
        • Change to Heading 4
          4
        • Change to Heading 5
          5
        • Change to Heading 6
          6
        • Change to Markdown Cell Type
          M
        • Change to Raw Cell Type
          R
        • Clear Outputs
        • Collapse All Code
        • Collapse All Outputs
        • Collapse Selected Code
        • Collapse Selected Outputs
        • Copy Cells
          Copy the selected cells
          C
        • Cut Cells
          Cut the selected cells
          X
        • Delete Cells
          D, D
        • Disable Scrolling for Outputs
        • Enable Scrolling for Outputs
        • Expand All Code
        • Expand All Outputs
        • Expand Selected Code
        • Expand Selected Outputs
        • Extend Selection Above
          Shift+K
        • Extend Selection Below
          Shift+J
        • Extend Selection to Bottom
          Shift+End
        • Extend Selection to Top
          Shift+Home
        • Insert Cell Above
          A
        • Insert Cell Below
          Insert a cell below
          B
        • Merge Cell Above
          Ctrl+Backspace
        • Merge Cell Below
          Ctrl+Shift+M
        • Merge Selected Cells
          Shift+M
        • Move Cells Down
        • Move Cells Up
        • Paste Cells Above
        • Paste Cells and Replace
        • Paste Cells Below
          Paste cells from the clipboard
          V
        • Redo Cell Operation
          Shift+Z
        • Render Side-by-Side
          Shift+R
        • Run Selected Cells
          Shift+Enter
        • Run Selected Cells and Do not Advance
          Ctrl+Enter
        • Run Selected Cells and Insert Below
          Alt+Enter
        • Run Selected Text or Current Line in Console
        • Select Cell Above
          K
        • Select Cell Below
          J
        • Set side-by-side ratio
        • Split Cell
          Ctrl+Shift+-
        • Undo Cell Operation
          Z
        • Notebook Operations
        • Change Kernel…
        • Clear All Outputs
        • Close and Shut Down
        • Collapse All Cells
        • Deselect All Cells
        • Enter Command Mode
          Ctrl+M
        • Enter Edit Mode
          Enter
        • Expand All Headings
        • Interrupt Kernel
        • New Console for Notebook
        • New Notebook
          Create a new notebook
        • Open with Voilà in New Browser Tab
        • Reconnect To Kernel
        • Render All Markdown Cells
        • Render Notebook with Voilà
        • Restart Kernel and Clear All Outputs…
        • Restart Kernel and Debug…
          Restart Kernel and Debug…
        • Restart Kernel and Run All Cells…
        • Restart Kernel and Run up to Selected Cell…
        • Restart Kernel…
        • Run All Above Selected Cell
        • Run All Cells
        • Run Selected Cell and All Below
        • Save and Export Notebook: Asciidoc
        • Save and Export Notebook: Executable Script
        • Save and Export Notebook: HTML
        • Save and Export Notebook: LaTeX
        • Save and Export Notebook: Markdown
        • Save and Export Notebook: PDF
        • Save and Export Notebook: Qtpdf
        • Save and Export Notebook: Qtpng
        • Save and Export Notebook: ReStructured Text
        • Save and Export Notebook: Reveal.js Slides
        • Save and Export Notebook: Webpdf
        • Select All Cells
          Ctrl+A
        • Toggle All Line Numbers
          Shift+L
        • Toggle Collapse Notebook Heading
        • Trust Notebook
        • Settings
        • Advanced JSON Settings Editor
        • Advanced Settings Editor
        • Show Contextual Help
        • Show Contextual Help
          Live updating code documentation from the active kernel
          Ctrl+I
        • Terminal
        • Decrease Terminal Font Size
        • Increase Terminal Font Size
        • New Terminal
          Start a new terminal session
        • Refresh Terminal
          Refresh the current terminal session
        • Use Terminal Theme: Dark
          Set the terminal theme
        • Use Terminal Theme: Inherit
          Set the terminal theme
        • Use Terminal Theme: Light
          Set the terminal theme
        • Text Editor
        • Decrease Font Size
        • Increase Font Size
        • Indent with Tab
        • New Markdown File
          Create a new markdown file
        • New Python File
          Create a new Python file
        • New R File
          Create a new R file
        • New Text File
          Create a new text file
        • Spaces: 1
        • Spaces: 2
        • Spaces: 4
        • Spaces: 8
        • Theme
        • Decrease Code Font Size
        • Decrease Content Font Size
        • Decrease UI Font Size
        • Increase Code Font Size
        • Increase Content Font Size
        • Increase UI Font Size
        • Theme Scrollbars
        • Use Theme: JupyterLab Dark
        • Use Theme: JupyterLab Light
        • Top Bar
        • Show Top Bar
        Would you like to receive official Jupyter news?
        Please read the privacy policy.